Many networks already run on very powerful routers. Quite often, the limitation is not the platform itself, but the lack of exploring what it can actually do. At ITcare we saw this a couple of times, especially for edge routers that peer with multiple Upstreams, Internet Exchanges, or Private Peerings.
When a BGP session flaps, thousands or millions of routes can be withdrawn and then reintroduced back into RPD for processing. This is where convergence really matters. If the control plane cannot keep up, you risk blackholing traffic even though the hardware is more than capable.
We saw exactly this in an ISP network running Juniper MX204 and MX240 as PE routers. They experienced outages during BGP flaps with Route Reflectors. The routers were strong enough, but convergence was not fast enough.
The fix turned out to be surprisingly simple.
We enabled BGP RIB Sharding and convergence time was reduced significantly.
At a high level, RIB sharding allows Juniper Networks routers to process BGP routes in parallel by splitting the RIB into multiple shards, each handled by its own thread. Instead of a single RPD thread processing millions of routes sequentially, the workload is distributed across multiple CPU cores. This dramatically improves convergence during large scale BGP events.
Once RIB sharding was enabled across the network, we saw convergence times improve by at least four times. We replayed the same failure scenario again and this time the impact was minimal. Only a few packets were lost during reconvergence, which was fully acceptable for the customer. Most importantly, the outages they previously experienced were gone.
There is excellent and very detailed article as well as a DayOne Book written by Sanjay Khanna and Jaihari Loganathan. If you want to understand how RIB sharding works under the hood and when it makes sense to use it, I highly recommend reading:
https://lnkd.in/e6fGzuQx
https://lnkd.in/e5sHssPE
The diagram from this post is taken from the Juniper Community blog article and is used here for reference.
For me, this was another reminder that performance issues are not always solved by buying bigger routers. Sometimes, it is about knowing the platform well enough to unlock what is already there.








