At 11:55pm tonight, our connection to Level3 appeared to disconnect. The symptoms are identical to the previous occasions where the Level3 router rebooted so we assume this to be the case again tonight.
The connection is slowly coming back online, which further supports this suspicion.
As Level3 appear to be unconcerned by a router that reboots itself, we have already made plans to migrate to an alternative transit supplier. This is due to take place early February 2017.
Once this replacement is online, we will likely retire our Level3 connectivity entirely.
Some routes may have experienced a brief period of packet loss or increased latency during this transit fault whilst routers around the internet changed paths to our alternate transit links. The majority of destinations would have been entirely unaffected however.
As this is caused by routers outside of our control, we have no way to prevent this brief packet loss to some destinations.
Any paths via our peering links or other transit connectivity are unaffected.
Yesterday, at around 11:50am, we had some reports of unusual activity on our ring network.
We traced this down to elevated levels of CPU usage on our switches. Further investigation pointed to a high level of IGMP traffic being received.
We further traced this to our LINX Extreme peering port in London and shut the port down. This resolved the issue instantly.
We contacted our Connexions provider, who provide our LINX ports. They did investigation to ensure the traffic was not originating internally and then passed the query onto LINX.
LINX confirmed this morning that they had a member port injecting IGMP traffic to the peering LAN yesterday and that this has now been resolved.
We have re-enabled our LINX Extreme port and confirmed the issue no longer exists.
We will re-enable our peering connections via this LAN shortly.
From just before 8am UK local time this morning, we have seen reports of intermittent connectivity issues with both BT and Plusnet.
We believe this is due to a power outage in one of the Telehouse datacentres in London.
(This is in addition to a reported power outage in Telecity Harbour Exchange Square in London yesterday which also affected BT)
This outage does not directly affect Netnorth, however it seems to be causing congestion for BT and Plusnet which means you may have trouble reaching some destinations if you use one of these providers for your internet connection.
This may also affect VirginMedia, however we have a direct link to VM in Manchester which bypasses most congestion on their network.
As the fault lies external to our network, we are unable to take any remedial action from our side. The BT issue lies inside BT’s network at this time.
We have currently lost a resilient path between two of our Manchester datacentre locations. Our building interconnection provider is currently performing maintenance on their equipment, however this outage is unexpected.
We have reported it to the vendor for investigation.
In the meantime, our TCW site is running at reduced resilience.
Our other sites are still operating fully resilient.
At 17:28 (UK Local Time), we noticed a drop in our Level3 connectivity.
Level3 remains unavailable to us at this time. We have opened a ticket with them to investigate.
Our network has reconverged to use our alternate providers. This would have caused some instability for any routes via Level3 (but not via our other paths) during the convergence.
This morning at approx. 8:55am, our BOL23 site saw a momentary loss of mains power from the electricity grid.
As a result, our standby generators started up in preparation to take the power load from the UPS supplies. The grid power came back online after a few seconds so no transfer to generated power occurred.
The generators continued to run for several minutes in case of a further loss of mains power, but was un-needed.
They automatically shutdown and returned to standby status.
No loss of power to Netnorth or customer equipment occurred and all services remained online for the duration of the grid power failure.
We have received reports of some unusual packet loss to certain destinations since 2:10am this morning.
We have conducted testing within our network to confirm that the issue lies outside of our network infrastructure.
As a result, we have raised the issue with our global connectivity providers to investigate further.
Our monitoring systems alerted us to IPv4 connectivity issues at 6:32pm this evening. This was traced to our connection with Level3, Inc.
It appears to have stopped passing IPv4 traffic due to a routing loop internally within Level3’s network.
We have shutdown our connectivity with them and raised a trouble ticket.
All routes will have reconverged via alternative paths.
We experienced another power outage of phase L3 in our BOL2 datacentre this morning at 3:09am.
Electricity Northwest are currently on-site tracing an underground cable fault.
It is hoped it will be within a section that can be isolated to restore power to the building.
The datacentre is currently running from generated power, and no outages are expected.
Staff are onsite monitoring the generators to ensure continued service.
Further to yesterday’s outage, we were alerted to another identical outage this evening at 11:26pm.
Once again, our automated failover systems migrated the site to locally generated power.
We are awaiting an update from ENWL, our regional energy supplier, with regards to a time to repair.