I'm currently working as a network engineer for a medium-sized Internet service provider Nebula in Finland. The interesting 24 hours I'm referring to, started yesterday at 16:10 Finnish time when one of the core routers on our network started acting up. Due to confidentiality issues I can't go into technical details on what happened, but this is the official announcement we gave out to customers during the incident. It was updated this morning and we might provide some further details later on. The translation from Finnish to English is my own.
Problems with network connectivity
Due to a software related issue on a core router, some network connections have experienced problems and interruptions since 16:10 local time. The issue is still under repair.
Update: the fault has been limited and connectivity restored at 20:30. The fault had been resolved already before 17:00, but the effects of it resulted in new issues in connectivity after 18:00.
The overload caused by the fault resulted in connectivity issues between the routers of the internal core network at Nebula. When the original issue had been located and resolved, the level of service started to normalize in stages. Latency in certain connections may have been higher than usual during recovery.
We are sorry for the inconvenience and will be peforming upgrades to minimize similar incidents in the future.
Kind of a domino effect really. Although I can't tell much of the events of last night, I can tell about the effects it had on my own connectivity. I guess it's not surprizing that an employee of an ISP uses the services of that company, so my home DSL-line saw what any other line in that area would have seen. I'm monitoring the status of my home router from a server located at the hosting facility at the office. It uses SNMP polling with intervals of 5 minutes. Altogether connectivity between the router and the server was down at 17:00-17:15, 17:55-18:20, 19:10-20:30 and 21:15-21:25. That's 2 hours 10 minutes.
Once everything was running smoothly (or pretty close to that), we did some initial maintainance operations to make sure it doesn't happen again. I left the office at 1:10 at night and got to bed around 2 o clock.
On that same night another Finnish telecom company Elisa had their regular maintainance window. They apparently upgraded software on quite a few of their own networking equipment, which caused short (less than 5 minutes) local breaks in connectivity around Southern-Finland while the boxed were restarting. A large scale upgrade rarely goes exactly as planned, so they managed to break some connections for a few hours, but nothing major really. It's more like a nice coincidence in timing.
In the morning yet another datacommunications service broke down. The mobile telecom operator Saunalahti had some issues with their mobile datacommunications. The announcement about them had been released at 8:48, but the problems were there already before 8 o clock in the morning. I was still sound asleep then, but my smartphone hadn't synched the emails as it was supposed to at 8:00. This issue again was resolved by Saunalahti at 12:22.
Another scheduled maintainance operation broke my own IRC connections when I restarted my Linux shell server at 15:00 this afternoon. I had scheduled this kernel upgrade for today already last week and desided to go on with it regardless of the events of last night. One more loss of connectivity didn't really matter that much at that point :)
The final event, just before 24 hours were up, was a brief loss of connectivity from irc.tdc.fi. This occured at around 15:45 today and lasted for about 5 minutes. I haven't bothered to ask what caused it, but it looked like a restart of the server.
In the end, these were all most likely unrelated incidents. They just happened within 24 hours and managed to change the flow of my day quite a few times. Hopefully things in my personal networking world aren't as interesting any time soon as they were just now.
|<< <||> >>|