Network Connectivity

We’re seeing packet loss on the LINX Extreme network this morning.
We’ve also got alerts coming in for our London location too, this is routed entirely separately to our Bolton/Manchester locations, so it looks like it is affecting the entire LINX LAN.

We’ve shunted most traffic over to the LINX Juniper network so this should die down now. We’ve not had any official word from LINX yet, however we will continue to monitor and migrate traffic paths where possible.

Netnorth Support

Storms / Power – Bolton

There is currently a storm over the Bolton area (with very nice fork lightning!) which has caused a brief power outage this evening around 7pm UK local time.

Our UPS units continued to operate during the outage with no loss of power to the datacentres, and our generators started to take over the load if required.

This was not required as the mains was restored within a few seconds.

Our generators returned to their idle state after a few minutes once mains power stability was confirmed.

 

Should there be any further outages, the generators will re-start automatically.  The process to transfer power to generator in a mains fail situation is fully automated.

LINX Extreme LAN – IGMP issue

Yesterday, at around 11:50am, we had some reports of unusual activity on our ring network.

We traced this down to elevated levels of CPU usage on our switches.  Further investigation pointed to a high level of IGMP traffic being received.

We further traced this to our LINX Extreme peering port in London and shut the port down.  This resolved the issue instantly.

We contacted our Connexions provider, who provide our LINX ports.  They did investigation to ensure the traffic was not originating internally and then passed the query onto LINX.

LINX confirmed this morning that they had a member port injecting IGMP traffic to the peering LAN yesterday and that this has now been resolved.

 

We have re-enabled our LINX Extreme port and confirmed the issue no longer exists.

We will re-enable our peering connections via this LAN shortly.

Cisco switch stack issue (DS-101)

Earlier this evening we identified one of our Cisco switch stacks misbehaving causing a constant stream of stack reconverges – this constant reconverge event has been causing layer 2 network instability for traffic flowing via the stack of nine switches.

We have just completed a physical inspection of all stack cables, including full reseating of cables as per Cisco’s guidelines – however the problem still persists.

It is possible that the stack issues are caused via a software fault within the Cisco IOS software.

We are currently applying an upgrade to the switch stack and will reboot the full stack afterwards to activate the changes.

Cisco advise doing this as a full cold reboot by removing the power from the stack members so this reload will take longer than usual.

 

Any customers connected to a different switch stack will only see a momentary outage during a layer 2 reconvergence.

Customers directly connected to switch stack DS-101 will see a total outage for up to 15 minutes.

Generator Tests – BOL

After a minor change in operating procedure, we briefly neglected to post the results of our weekly generator tests. For completeness, here’s a list of the intervening tests…

28-08-2016
BOL1  10:12 - 10:38  Off load  Passed
BOL2  10:56 - 11:13  Off load  Passed

05-07-2016
BOL1  10:24 - 10:42  On load   Passed
BOL2  10:53 - 11:16  On load   Passed

12-07-2016
BOL1  10:20 - 10:49  Off load  Passed
BOL2  11:02 - 11:22  Off load  Passed

19-07-2016
BOL1  10:15 - 10:25  Off load  Passed
BOL2  10:55 - 11:14  Off load  Passed

28-07-2016
BOL1  10:08 - 10:32  Off load  Passed
BOL2  10:41 - 11:03  Off load  Passed

The generator was started, then ran off-load / on-load for the durations given above, before detecting mains and shutting down in the expected timeframe.
All measured values were within their normal ranges.