blog articles

WHY PERFECTLY REDUNDANT NETWORKS STILL FAIL

During a recent engagement, a customer confidently described their network as highly redundant. Internally, everything looked clean. Multiple exit points were in place. Upstream connectivity was protected by redundant paths and redundant BGP sessions. On paper, the design appeared solid. 

However, reality under failure conditions told a different story. 

A common architectural practice for downstream routers that cannot hold the full routing table is simple: advertise only a default route. The core or edge devices handle full table computation and best path selection, while downstream devices forward traffic using the default. 

The critical detail is not whether you advertise a default route. The real question is how you advertise it. 

  • Do you originate the default route?
  • Do you redistribute it from an upstream provider?
  • Do you preserve the provider next hop?
  • Do you advertise supporting prefixes to ensure next hop resolution? 

 

These decisions must be deliberate and consistent. 

In this particular case, the customer redistributed the default route from their upstream providers. With one provider, both the default route and the point to point prefix were advertised. Downstream routers could resolve the next hop correctly, therefore everything worked as expected. 

With another provider, only the default route was advertised. 

The problem was subtle but devastating. The default route pointed to a next hop that did not exist in the downstream routers’ routing tables. Because the next hop could not be resolved, the route was never installed. 

When the primary upstream failed, downstream routers were left without any usable default route. Redundancy existed physically, yet traffic had nowhere to go. 

This type of failure is more common than many teams expect. Redundancy without consistent policy, proper next hop resolution, and failure testing often produces fragile behavior instead of resiliency. 

✅ Rule of thumb 

Never assume redundancy works because diagrams look correct. Test failure scenarios regularly. The most dangerous outages are the ones hidden behind “perfect” designs. 

#NetworkDesign #BGP #IPNetworks #Redundancy #Routing #NetworkOperations #ITcare