On Cisco routers, by default the BGP protocol will not do load-sharing – and even less unequal cost load-sharing – across multiple links, for traffic to the same eBGP destination with different AS-path. Let’s see how we can change this.
We can configure the command: “maximum-paths n”, but it only works if the weight, local-pref. and AS-path attributes are the same across the different uplinks.
So how can we do load sharing if we are multihomed to different ASes? In that case, we must use the BGP command: “bgp bestpath as-path multipath-relax”.
VIRL lab topology
Let’s lab this! Here is my VIRL topology:
- AS-4 is our multihomed company. They have two BGP upstreams: AS-2 and AS-3.
- AS-1 in blue represent the Internet. 1.1.1.1 here is “the Internet”.
- AS-5 is a remote site or a branch-office, they are using a different AS number for some reason. AS-4 is providing Internet access to them.
- Note: in this topology, AS-2 and AS-3 are sending the “full” BGP table to AS-4 – even the full table here is only 4 prefixes – and also the default-route. This is to demonstrate that this command works with the default route and with a specific prefix.
Without “multipath-relax”
First, let’s see this topology without the command bgp bestpath as-path multipath-relax.
R1 BGP configuration
router bgp 1 bgp log-neighbor-changes network 1.1.1.1 mask 255.255.255.255 neighbor 10.0.0.6 remote-as 2 neighbor 10.0.0.6 next-hop-self neighbor 10.0.0.10 remote-as 3 neighbor 10.0.0.10 next-hop-self maximum-paths 4
R2 / R3 BGP configuration
router bgp 2 bgp log-neighbor-changes network 2.2.2.2 mask 255.255.255.255 neighbor 10.0.0.5 remote-as 1 neighbor 10.0.0.5 next-hop-self neighbor 10.0.0.14 remote-as 4 neighbor 10.0.0.14 next-hop-self neighbor 10.0.0.14 default-originate router bgp 3 bgp log-neighbor-changes network 3.3.3.3 mask 255.255.255.255 neighbor 10.0.0.9 remote-as 1 neighbor 10.0.0.9 next-hop-self neighbor 10.0.0.18 remote-as 4 neighbor 10.0.0.18 next-hop-self neighbor 10.0.0.18 default-originate
R4 BGP configuration
router bgp 4 bgp log-neighbor-changes network 4.4.4.4 mask 255.255.255.255 neighbor 10.0.0.13 remote-as 2 neighbor 10.0.0.13 next-hop-self neighbor 10.0.0.17 remote-as 3 neighbor 10.0.0.17 next-hop-self neighbor 10.0.0.22 remote-as 5 neighbor 10.0.0.22 next-hop-self maximum-paths 4
R5 BGP configuration
router bgp 5 bgp log-neighbor-changes network 5.5.5.5 mask 255.255.255.255 neighbor 10.0.0.21 remote-as 4 neighbor 10.0.0.21 next-hop-self
Very straightforward, each router announces its own loopback interface.
In that situation, AS-4 or R4, receive the default-route and the prefix 1.1.1.1/32 from AS-2 and AS-3, and choose one of then based on the Cisco BGP best-path selection algorithm.
R4#show ip bgp 0.0.0.0 BGP routing table entry for 0.0.0.0/0, version 12 Paths: (2 available, best #2, table default) Multipath: eBGP Advertised to update-groups: 1 Refresh Epoch 2 3 10.0.0.17 from 10.0.0.17 (3.3.3.3) Origin IGP, localpref 100, valid, external rx pathid: 0, tx pathid: 0 Refresh Epoch 2 2 10.0.0.13 from 10.0.0.13 (2.2.2.2) Origin IGP, localpref 100, valid, external, best rx pathid: 0, tx pathid: 0x0 R4#
In that case, the second path is the best because of the rule #11 of the selection algorithm: Prefer the route that comes from the BGP router with the lowest router ID.
Note: I started all the routers at the same time and the command “bgp bestpath compare-routerid” is enabled by default.
Then, if I clear the BGP session with R2, this will change:
R4#clear ip bgp 10.0.0.13 R4# *Mar 2 08:55:46.164: %BGP-3-NOTIFICATION: sent to neighbor 10.0.0.13 6/4 (Administrative Reset) 0 bytes *Mar 2 08:55:46.169: %BGP-5-ADJCHANGE: neighbor 10.0.0.13 Down User reset *Mar 2 08:55:46.170: %BGP_SESSION-5-ADJCHANGE: neighbor 10.0.0.13 IPv4 Unicast topology base removed from session User reset *Mar 2 08:55:46.763: %BGP-5-ADJCHANGE: neighbor 10.0.0.13 Up R4#show ip bgp R4# R4#show ip bgp 0.0.0.0 BGP routing table entry for 0.0.0.0/0, version 20 Paths: (2 available, best #2, table default) Multipath: eBGP Advertised to update-groups: 1 Refresh Epoch 1 2 10.0.0.13 from 10.0.0.13 (2.2.2.2) Origin IGP, localpref 100, valid, external rx pathid: 0, tx pathid: 0 Refresh Epoch 2 3 10.0.0.17 from 10.0.0.17 (3.3.3.3) Origin IGP, localpref 100, valid, external, best rx pathid: 0, tx pathid: 0x0 R4#
Why? Because now we must refer again to the BGP best-path selection algorithm, point #10: When both paths are external, prefer the path that was received first (the oldest one).
The oldest one! This is made to minimize route-flapping. We keep the oldest received path. In that situation, if AS-3 make a maintenance, the default-route for AS-4 will change again.
With “multipath-relax”
Now let’s start to make load-sharing between the two upstreams. I added the command: ” bgp bestpath as-path multipath-relax” to R4:
router bgp 4
bgp log-neighbor-changes
bgp bestpath as-path multipath-relax
network 4.4.4.4 mask 255.255.255.255
neighbor 10.0.0.13 remote-as 2
neighbor 10.0.0.13 next-hop-self
neighbor 10.0.0.17 remote-as 3
neighbor 10.0.0.17 next-hop-self
neighbor 10.0.0.22 remote-as 5
neighbor 10.0.0.22 next-hop-self
maximum-paths 4
R4#
Please note, the “maximum-paths n” command is mandatory, the value should be at least 2 in this case, otherwise BGP will choose only one path.
Now we can see the little “m” for multipath into the BGP table:
R4#show ip bgp
(...)
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0 10.0.0.13 0 2 i
*m 10.0.0.17 0 3 i
*> 1.1.1.1/32 10.0.0.13 0 2 1 i
*m 10.0.0.17 0 3 1 i
*> 2.2.2.2/32 10.0.0.13 0 0 2 i
* 10.0.0.17 0 3 1 2 i
* 3.3.3.3/32 10.0.0.13 0 2 1 3 i
*> 10.0.0.17 0 0 3 i
*> 4.4.4.4/32 0.0.0.0 0 32768 i
*> 5.5.5.5/32 10.0.0.22 0 0 5 i
R4#
R4#
You can see for the default and 1.1.1.1/32 prefixes we have multipath.
Then, you can also see that AS-1 is doing transit between AS-2 and AS-3 in both directions (prefixes 2.2.2.2/32 and 3.3.3.3/32) and in that cast the shortest as-path is still the best path, without multipath.
In details for 0.0.0.0:
R4#show ip bgp 0.0.0.0 BGP routing table entry for 0.0.0.0/0, version 2 Paths: (2 available, best #1, table default) Multipath: eBGP Advertised to update-groups: 3 Refresh Epoch 1 2 10.0.0.13 from 10.0.0.13 (2.2.2.2) Origin IGP, localpref 100, valid, external, multipath, best rx pathid: 0, tx pathid: 0x0 Refresh Epoch 1 3 10.0.0.17 from 10.0.0.17 (3.3.3.3) Origin IGP, localpref 100, valid, external, multipath(oldest) rx pathid: 0, tx pathid: 0 R4#
In the routing table of R4, we can see a real load-sharing for 0.0.0.0/0:
R4#show ip route 0.0.0.0 Routing entry for 0.0.0.0/0, supernet Known via "bgp 4", distance 20, metric 0, candidate default path Tag 2, type external Last update from 10.0.0.17 00:10:36 ago Routing Descriptor Blocks: 10.0.0.17, from 10.0.0.17, 00:10:36 ago Route metric is 0, traffic share count is 1 AS Hops 1 Route tag 2 MPLS label: none * 10.0.0.13, from 10.0.0.13, 00:10:36 ago Route metric is 0, traffic share count is 1 AS Hops 1 Route tag 2 MPLS label: none
bgp bestpath as-path ignore
You cannot have the same result with the command ” bgp bestpath as-path ignore” because this command skip the as-path length comparison yes, but you still receive the prefix from two different AS and this is not multipath.
Furthermore, if you try to enable “bgp bestpath as-path ignore” in conjunction with “bgp bestpath multipath-relax as-path” you have this nice message from Cisco IOS:
R4(config)#router bgp 4
R4(config-router)# bgp bestpath as-path ignore
% Cannot be used in conjunction with 'bgp bestpath multipath-relax as-path'
R4(config-router)#
In conclusion, bgp multipath works only for the default-route received by your upstream and with prefixes with the exact same as-path length.
BGP unequal cost load sharing
Now let’s go a little further and do unequal cost load sharing for outgoing traffic from AS-4. I will use the same topology as before but with different uplinks bandwidths:
Now we have 20Mbps between AS-4 and AS-2 and 40Mbps between AS-4 and AS3.
For this, on R4 I have first to configure the real bandwidth on the physical interfaces to R2 and R3:
! interface GigabitEthernet0/1 description to R2 bandwidth 20000 ip address 10.0.0.14 255.255.255.252 duplex full speed auto media-type rj45 ! interface GigabitEthernet0/3 description to R3 bandwidth 40000 ip address 10.0.0.18 255.255.255.252 duplex full speed auto media-type rj45 !
And then, I need to configure BGP dmzlink-bw globally and for each uplink:
router bgp 4 bgp log-neighbor-changes bgp bestpath as-path multipath-relax bgp dmzlink-bw network 4.4.4.4 mask 255.255.255.255 neighbor 10.0.0.13 remote-as 2 neighbor 10.0.0.13 next-hop-self neighbor 10.0.0.13 dmzlink-bw neighbor 10.0.0.17 remote-as 3 neighbor 10.0.0.17 next-hop-self neighbor 10.0.0.17 dmzlink-bw neighbor 10.0.0.22 remote-as 5 neighbor 10.0.0.22 next-hop-self maximum-paths 4 R4#
Now, let’s look at the routing table for 0.0.0.0/0:
R4#show ip route 0.0.0.0 Routing entry for 0.0.0.0/0, supernet Known via "bgp 4", distance 20, metric 0, candidate default path Tag 2, type external Last update from 10.0.0.17 00:03:14 ago Routing Descriptor Blocks: 10.0.0.17, from 10.0.0.17, 00:03:14 ago Route metric is 0, traffic share count is 2 AS Hops 1 Route tag 2 MPLS label: none * 10.0.0.13, from 10.0.0.13, 00:03:14 ago Route metric is 0, traffic share count is 1 AS Hops 1 Route tag 2 MPLS label: none R4#
Now you can see we have a share count 2:1 between the two interfaces, based on the bandwidth configured.
Hi, nice article. What about inbound traffic load-balancing? How do you achieve this? I am personally no fan of doing outbound loadbalancing at all. The main problem is that traffic is very likely to be asymetric and this is always something I would try to avoid. This is because in the case of issues troubleshooting becomes a higher challenge. CU
Excellent one, so many features I didnt know about BGP