We have received reports this morning of network alerts from 3rd party providers indicating trouble reaching the Netnorth network.
After further investigation, we narrowed the issue to part of the Level3 network (traceroutes included below) located in the US – we raised this with Level3 who confirmed they currently have a network issue ticket open for UK-US traffic with packet loss exceeding 70% (although we see 90-100% in our tests).
We have extensive monitoring throughout UK and Europe, but only key point monitoring within the USA which did not flag any issues automatically.
We will look to add further test points in the US and Asia to attempt to locate these remote issues quicker.
This issue should not have affected any UK or EU traffic – but many “site uptime” servers will test from multiple locations and report issues with any location.
Here are a few traceroutes showing the issue lying within Level3’s network:
1 vzd114.mediatemple.net (205.186.158.19) 0.060 ms 0.024 ms 0.022 ms 2 e1.2.cr01.iad01.mtsvc.net (70.32.64.249) 0.286 ms 0.276 ms 0.251 ms 3 65.97.50.1 (65.97.50.1) 6.981 ms 7.347 ms 7.680 ms 4 br01-1-1.iad2.netdc.com (65.97.48.205) 0.468 ms 0.460 ms 0.545 ms 5 209.48.42.149 (209.48.42.149) 0.382 ms 0.372 ms 0.420 ms 6 206.111.0.66.ptr.us.xo.net (206.111.0.66) 0.935 ms 0.947 ms 0.957 ms 7 * * * 8 * * ae-14-14.bar1.Toronto1.Level3.net (4.69.200.93) 142.194 ms 9 ae-0-11.bar2.Toronto1.Level3.net (4.69.151.242) 144.597 ms 144.601 ms * 10 * * * 11 * * * 12 ae-41-41.ebr2.London1.Level3.net (4.69.137.65) 234.130 ms * * 13 * * vlan102.ebr1.London1.Level3.net (4.69.143.89) 224.504 ms 14 ae-4-4.car1.Manchesteruk1.Level3.net (4.69.133.101) 224.407 ms * * 15 * NETNORTH-LT.car1.Manchester1.Level3.net (195.50.119.74) 219.169 ms 219.394 ms
HOST: stats.netnorth.co.uk Loss% Snt Last Avg Best Wrst StDev 1. po1-16.router.tcw.netnorth.co.uk 0.0% 10 0.7 0.7 0.5 0.8 0.1 2. ge-6-18.car1.Manchester1.Level3.net 0.0% 10 0.9 0.9 0.7 1.1 0.1 3. ??? 100.0 10 0.0 0.0 0.0 0.0 0.0 4. AMAZON.COM.edge2.Washington1.Level3.net 90.0% 10 221.4 221.4 221.4 221.4 0.0 5. 72.21.220.149 80.0% 10 223.5 223.4 223.3 223.5 0.1 6. 205.251.245.232 70.0% 10 223.9 223.6 223.0 224.0 0.5 7. ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
Sprint Source Region: Anaheim, CA (sl-crs3-ana) IP Destination: 82.148.224.24 Performing: ICMP Traceroute
Wed May 6 08:11:49.079 UTC Tracing the route to 82.148.224.24 1 144.232.13.244 4 msec 3 msec 2 msec 2 144.232.24.40 6 msec 6 msec 5 msec 3 ae14.edge1.LosAngeles9.Level3.net (4.68.111.89) 4 msec 3 msec 2 msec 4 * * * 5 * * * 6 * * *
core1.tyo1.he.net> traceroute 82.148.224.24 traceroute to 82.148.224.24 (82.148.224.24), 30 hops max, 60 byte packets 1 74.82.46.5 3.918 ms 3.946 ms 4.023 ms 2 184.105.223.105 133.830 ms 133.818 ms 133.894 ms 3 80.239.167.189 98.187 ms 98.262 ms 98.247 ms 4 213.155.137.58 98.174 ms 213.155.134.252 98.211 ms 213.155.130.126 98.086 ms 5 4.68.70.129 102.706 ms 97.844 ms 102.817 ms 6 * * * 7 * * * 8 * * * 9 * * 4.69.151.242 232.329 ms 10 * * * 11 * * * 12 * * 4.69.137.77 309.750 ms 13 4.69.143.97 318.915 ms * 4.69.143.85 309.842 ms 14 4.69.133.101 319.404 ms * * 15 * 195.50.119.74 318.680 ms * 16 * * * 17 82.148.224.24 309.715 ms * 318.889 ms
After raising this issue with Level3, we have removed the Level3 connectivity from our network while they work on the issue internally.
This should resolve most paths while they re-route via our alternate providers.
NOTE: some paths may still choose to use Level3 into the United Kingdom, but these are outside of our control.
We have received an update from Level3, but just confirming that the issue is ongoing. They provide these at certain intervals in accordance with their SLA.
NOTE: we are attached to the main network issue ticket so will receive updates including services we do not use (such as the CDN part of this ticket)
Update below:
[SUMMARY OF WORK]
Please be advised your service is currently being impacted by an event on the Level 3 network.
Investigations are on-going and this ticket has been related to the main event ticket in order for you to be kept updated with the event progress.
Please see the most recent update below.
Updates
08:58 GMT – The IP NOC responded to alarms indicating CDN services in multiple markets are being impacted by a packet loss issue. The trouble is being investigated and an estimate time of restoral cannot be provided at this time.
[PLAN OF ACTION]
Level 3 to provide further updates accordingly.
Level3 have updated the ticket as follows:
*** CASCADED EXTERNAL NOTES 06-May-2015 10:14:56 GMT From CASE: 9198136 – Event
Through investigations the IP NOC isolated the trouble to overutilization on a link from Toronto to Chicago. Traffic was rerouted to avoid the trouble and services are now restored. The IP NOC will continue monitoring services to ensure continued stability. If additional issues are experienced, please contact the Level 3 Technical Service Center.
The latency to the US appears to have been stable for the last two hours, so we have re-established our Level3 connectivity into the network.
A little insight into diagnostics can be found at the following post:
http://status.netnorth.net/2015/05/06/why-level3-connectivity-issues-are-difficult-to-diagnose/
Continued at:
http://status.netnorth.net/2015/05/06/more-us-level3-issues/