Emergency Router Maintenance (28/Jan/2017)

Following on from the router crash experienced yesterday, we have performed an emergency upgrade of the Cisco IOS-XE software on all of our internet-facing routers.

This update should resolve the issue that was triggered yesterday.

No outage to customers should have been experienced today, but there may have been brief periods of slightly increased latency during route changes.

Cisco Router Crash – 27/Jan/2017

At 3:52pm today, one of our Cisco ASR routers experienced a crash within its routing engine.

This caused the router to instantly stop routing and any destinations via the router experienced an outage.

Unfortunately, this did not just sever connectivity cleanly… it started causing “flapping” (where routes are introduced and removed over and over again causing instability).  Once this flapping was identified, we severed all network connectivity to the affected router.

After a few minutes, BGP failover took over and traffic re-routed via alternative paths as it is designed to do.  This is how a normal crash would be handled.

The router crashed in such a way that it had to be physically power cycled to regain control afterwards.  We then brought its routing online in a slow and controlled fashion to prevent any further disruption to the network.

 

After some research, it appears that we hit CSCus82903 which is a known Cisco Bug in our edition of routing software.

This was triggered when attempting to bring online our new IP connectivity provider, GTT, this afternoon – a normally routine procedure with no impact to customer traffic.

 

Our routers are currently stable and operating normally, however we need to perform some emergency maintenance to upgrade the software of the routers to a patched version provided by Cisco.

This should be able to occur without causing any additional outages, although the network routing should be considered “at risk” during the actual software upgrade.

In the meantime, our GTT connection has been kept offline to prevent the issue reappearing.  We will re-establish the connection once the software upgrades are complete.

3rd Party Maintenance (23/Feb/2017)

We have received a maintenance notification from Virgin Media regarding one of our metro fibre circuits between Bolton and Manchester.

This maintenance is due on 23rd February 2017 between midnight and 7am, with an expected outage time of 20 minutes.

No outage to Netnorth customers is expected as we have multiple metro fibre links via diverse paths via diverse fibre providers.  Our network will automatically re-route any traffic via our other fibres during the outage.