Continuing from http://status.netnorth.net/2015/05/06/us-based-connectivity-issue/
We just started to receive monitoring reports showing the same level of packet loss as received earlier this morning.
We have removed the Level3 connectivity from our network immediately forcing re-routing of traffic.
We will contact Level3 for more information.
The following update has been received by Level3:
*** CASCADED EXTERNAL NOTES 06-May-2015 20:38:49 GMT From CASE: 9198136 – Event
The IP NOC detected a re-occurrence of the earlier packet loss issue. Investigations revealed that critical protocols were not functioning correctly on an edge device in Montreal. The protocols have been adjusted to restore services. Services and the health of the network have been validated as stable, and there are no further events expected.
We are not entirely happy with the explaination from Level3 that a single router in Canada can cause such disruption to so many US-based destinations (such as paypal, ebay and various monitoring stations).
We have requested a further escalation of our ticket with them.
We will continue to keep Level3 connectivity offline at the moment.
We have received a more in-depth and useful update from Level3 as follows:
“Level 3 can confirm that the issue was with traffic routing via Montreal instead of routing via New York.
The traffic affected which would normally route from London to New York was being routed via Montreal which was over utilizing the links in Montreal and causing latency and heavy packet loss.”
This makes much more sense and explains the issue.
We have re-integrated Level3 connectivity into our network.
We have attempted to reduce the use of this link by various traffic engineering policies on the circuit.
This is not guaranteed (BGP routing is inflexible), but should minimise any future impact should it reoccur.
A final update and analysis from Level3 has been received as follows:
“An in depth review of the issue with the backbone/edge access router in Montreal, Quebec was completed. The topology in Canada is unique in the network and lead to this issue. Specifically, policies were set up to keep traffic local to Canada. This was developed and validated ahead of time by engineering; however, due to the size and scope of the network, modeling did not review certain nuances that lead to this condition. Montreal, Quebec erroneously became the preferred route for a number of IP destinations in the network, thus saturating the aforementioned links into that market. Level 3 Management have been actively involved in the review of this case to ensure appropriate policies are in place to prevent future issues.”