Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements
Romain Fontugne, Emile Aben, Cristel Pelsser, Randy
Bush; Pinpointing Delay and Forwarding Anomalies Using Large-Scale Traceroute Measurements; IMC 2017
Understanding data plane health is essential to improving Internet reliability and usability. For instance, detecting disruptions in distant networks can identify repairable connectivity problems. Currently this task is difficult and time consuming as operators have poor visibility beyond their network’s border. In this paper we leverage the diversity of RIPE Atlas traceroute measurements to solve the classic problem of monitoring in-network delays and get credible delay change estimations to monitor network conditions in the wild. We demonstrate a set of complementary methods to detect network disruptions and report them in near real time. The first method detects delay changes for intermediate links in traceroutes. Second, a packet forwarding model predicts traffic paths and identifies faulty routers and links in cases of packet loss. In addition, we define an alarm score that aggregates changes into a single value per AS in order to easily monitor its sanity, reducing the effect of uninteresting alarms. Using only existing public data we monitor hundreds of thousands of link delays while adding no burden to the network. We present three cases demonstrating that the proposed methods detects real disruptions and provides valuable insights, as well as surprising findings on the location and impact of the identified events.
Permalink Comments off