At approx 8:25pm on 30th May 2024 we identified some network disruption to our core network.
After investigation, we discovered that we had lost power to a rack in Manchester that houses some connectivity and also one of our core routers.
Generally this would cause a couple of minutes of disruption while BGP reconverged routes to alternative paths.
Unfortunately, once power was restored after a couple of minutes the core router present in the rack failed to establish its aggregated interface to our core network. This caused a split brain network condition where some direct routes via Equinix IX Manchester exchange terminated on the router with no connection to the core.
As a result any connectivity via that exchange was unable to complete its journey leading to partial connectivity loss. Connectivity via other paths was unaffected after the BGP reconvergence.
Additionally, one of our metro ethernet fibre circuits provided by Openreach failed to recover once power was restored.
We attended site in Manchester to diagnose and rectify the router issue. Once the aggregated link was brought back online any paths via Equinix IX Manchester could complete their journey.
We also tested our equipment connected to the Openreach circuit and determined that the fault lay with the Openreach termination equipment in the rack. We opened a ticket with Openreach and they dispatched engineers to Manchester to replace the faulty module in their termination equipment.
The metro ethernet circuit was restored at approx 3am UK local time.
The power failure to the rack in question was part of Equinix maintenance where we incorrectly believed the equipment in question was dual power fed from resilient feeds. The dual feeds are actually serviced from the same breaker and so rather than a reduction in resilience (as expected) we experienced a full loss of power. No further power outages are expected as the Equinix maintenance is completed. We will investigate installation of a resilient breaker feed into the rack in question.
There are no outstanding issues remaining. If you have any ongoing issues with your service, please report these to us as normal for us to investigate.