10 days of emails arguing about packet loss in our customer’s network.
The evidence they sent us was proof they didn’t understand traceroute.
We run the NOC for an operator. A partner network kept insisting that traffic to a destination behind our customer infrastructure was dropping in transit. Errors, drops, buffers, routing, looking glasses, reviewed, all clean from our side. We asked for evidence.
They sent us one ICMP traceroute. Some transit routers in our customer’s path showed 20-30% loss. The final destination showed almost none. That output isn’t a network problem. That output is the network behaving exactly as it should.
Every modern router polices ICMP toward its Routing Engine. CoPP on Cisco, firewall filters on Junos. Without it, an attacker with hping can overload the CPU of a router and it down. Best practice on every well run network.
When you fire probes at a transit hop, the line card forwards every packet you’re actually sending. The control plane answers X of your TTL=N probes per second, the rest get dropped at the policer. You see 80% “loss on hop N.” Your real traffic is fine if the destination whose actual job to respond looks clean.
Why this happens ? Because companies don’t train well their personnel. Every NOC and Customer Support Engineer must know how traceroute actually works.
A traceroute increments the TTL by 1 for each probe. TTL=1 expires on hop 1, which returns an ICMP Time Exceeded (RFC 792). TTL=2 expires at hop 2 and so on. The “loss” at hop N is generated by hop N’s control plane responding or refusing to respond. It has nothing to do with the data plane that forwards your real traffic.
Besides that, there are few options how you can run the traceroute, it’s either ICMP, UDP or even TCP, that’s also an important aspect. When at least one of thus shows no packet loss, that means you are hitting a CoPP, an ACL, a firewall, or MPLS hiding hops.
This is the question I’d ask any junior engineer in a NOC interview: “Explain how traceroute works and why packet loss on a transit hop usually means nothing.” If they can’t answer, they aren’t ready to read the output and let alone open a ticket on someone else’s network.







