Unplanned Outage

A core switch failure in our U23 DC will have affected all connectivity briefly this morning, we are currrently investigating the cause, and are remedially migrating a couple of significantly affected hosts as a priority. A fuller RFO will follow.

Bookmark the permalink.

One Response to Unplanned Outage

  1. Dan A says:

    RFO

    One of our core data switches (DS-190) located in BOL23 datacentre sustained a failure of its onboard ECC memory, this caused the switch to power off and refuse to reboot. Ordinarily this event would only have affected any customers with single-homed connections to this data switch, which is avoided by design.

    However due to a misconfiguration of an alternate path the expected failover didn’t happen, this segregated an older set of distribution switches on our network, which caused a much larger disruptive event than ever expected. Additionally this core switch affected one of our core routers.

    As a priority we have remedially migrated any direct connections from this switch to our new distribution network which has much higher resilience.
    The affected core router has also been migrated to the new switch network which allows it to be simultaneously connected to 2 separate core switches.

    Ultimately all ports will be migrated over to the new distribution network, currently the majority of BOL8 datacentre was already migrated, and BOL23 datacentre migration has now been expedited.

    We will investigate how to better link the older legacy switched network to our new network to prevent future issues.
    Ultimately all ports will be migrated to the new network.

Leave a Reply