BGP load sharing and unequal cost load sharing

On Cisco routers, by default the BGP protocol will not do load-sharing – and even less unequal cost load-sharing – across multiple links, for traffic to the same eBGP destination with different AS-path. Let’s see how we can change this.

We can configure the command: “maximum-paths n”, but it only works if the weight, local-pref. and AS-path attributes are the same across the different uplinks.

So how can we do load sharing if we are multihomed to different ASes? In that case, we must use the BGP command: “bgp bestpath as-path multipath-relax”.

VIRL lab topology

Let’s lab this! Here is my VIRL topology:

  • AS-4 is our multihomed company. They have two BGP upstreams: AS-2 and AS-3.
  • AS-1 in blue represent the Internet. 1.1.1.1 here is “the Internet”.
  • AS-5 is a remote site or a branch-office, they are using a different AS number for some reason. AS-4 is providing Internet access to them.
  • Note: in this topology, AS-2 and AS-3 are sending the “full” BGP table to AS-4 – even the full table here is only 4 prefixes – and also the default-route. This is to demonstrate that this command works with the default route and with a specific prefix.

 Without “multipath-relax”

First, let’s see this topology without the command bgp bestpath as-path multipath-relax.

R1 BGP configuration

router bgp 1
 bgp log-neighbor-changes
 network 1.1.1.1 mask 255.255.255.255
 neighbor 10.0.0.6 remote-as 2
 neighbor 10.0.0.6 next-hop-self
 neighbor 10.0.0.10 remote-as 3
 neighbor 10.0.0.10 next-hop-self
 maximum-paths 4

R2 / R3 BGP configuration

 router bgp 2
 bgp log-neighbor-changes
 network 2.2.2.2 mask 255.255.255.255
 neighbor 10.0.0.5 remote-as 1
 neighbor 10.0.0.5 next-hop-self
 neighbor 10.0.0.14 remote-as 4
 neighbor 10.0.0.14 next-hop-self
 neighbor 10.0.0.14 default-originate

router bgp 3
 bgp log-neighbor-changes
 network 3.3.3.3 mask 255.255.255.255
 neighbor 10.0.0.9 remote-as 1
 neighbor 10.0.0.9 next-hop-self
 neighbor 10.0.0.18 remote-as 4
 neighbor 10.0.0.18 next-hop-self
 neighbor 10.0.0.18 default-originate

R4 BGP configuration

router bgp 4
 bgp log-neighbor-changes
 network 4.4.4.4 mask 255.255.255.255
 neighbor 10.0.0.13 remote-as 2
 neighbor 10.0.0.13 next-hop-self
 neighbor 10.0.0.17 remote-as 3
 neighbor 10.0.0.17 next-hop-self
 neighbor 10.0.0.22 remote-as 5
 neighbor 10.0.0.22 next-hop-self
 maximum-paths 4

R5 BGP configuration

router bgp 5
 bgp log-neighbor-changes
 network 5.5.5.5 mask 255.255.255.255
 neighbor 10.0.0.21 remote-as 4
 neighbor 10.0.0.21 next-hop-self

Very straightforward, each router announces its own loopback interface.

In that situation, AS-4 or R4, receive the default-route and the prefix 1.1.1.1/32 from AS-2 and AS-3, and choose one of then based on the Cisco BGP best-path selection algorithm.

R4#show ip bgp 0.0.0.0 
BGP routing table entry for 0.0.0.0/0, version 12
Paths: (2 available, best #2, table default)
Multipath: eBGP
 Advertised to update-groups:
 1 
 Refresh Epoch 2
 3
 10.0.0.17 from 10.0.0.17 (3.3.3.3)
 Origin IGP, localpref 100, valid, external
 rx pathid: 0, tx pathid: 0
 Refresh Epoch 2
 2
 10.0.0.13 from 10.0.0.13 (2.2.2.2)
 Origin IGP, localpref 100, valid, external, best
 rx pathid: 0, tx pathid: 0x0
R4#

In that case, the second path is the best because of the rule #11 of the selection algorithm: Prefer the route that comes from the BGP router with the lowest router ID.

Note: I started all the routers at the same time and the command “bgp bestpath compare-routerid” is enabled by default.

Then, if I clear the BGP session with R2, this will change:

R4#clear ip bgp 10.0.0.13
R4#
*Mar 2 08:55:46.164: %BGP-3-NOTIFICATION: sent to neighbor 10.0.0.13 6/4 (Administrative Reset) 0 bytes 
*Mar 2 08:55:46.169: %BGP-5-ADJCHANGE: neighbor 10.0.0.13 Down User reset
*Mar 2 08:55:46.170: %BGP_SESSION-5-ADJCHANGE: neighbor 10.0.0.13 IPv4 Unicast topology base removed from session User reset
*Mar 2 08:55:46.763: %BGP-5-ADJCHANGE: neighbor 10.0.0.13 Up 
R4#show ip bgp 
R4#
R4#show ip bgp 0.0.0.0
BGP routing table entry for 0.0.0.0/0, version 20
Paths: (2 available, best #2, table default)
Multipath: eBGP
 Advertised to update-groups:
 1 
 Refresh Epoch 1
 2
 10.0.0.13 from 10.0.0.13 (2.2.2.2)
 Origin IGP, localpref 100, valid, external
 rx pathid: 0, tx pathid: 0
 Refresh Epoch 2
 3
 10.0.0.17 from 10.0.0.17 (3.3.3.3)
 Origin IGP, localpref 100, valid, external, best
 rx pathid: 0, tx pathid: 0x0
R4#

Why? Because now we must refer again to the BGP best-path selection algorithm, point #10: When both paths are external, prefer the path that was received first (the oldest one).

The oldest one! This is made to minimize route-flapping. We keep the oldest received path. In that situation, if AS-3 make a maintenance, the default-route for AS-4 will change again.

With “multipath-relax”

Now let’s start to make load-sharing between the two upstreams. I added the command: ” bgp bestpath as-path multipath-relax” to R4:

router bgp 4
 bgp log-neighbor-changes
 bgp bestpath as-path multipath-relax
 network 4.4.4.4 mask 255.255.255.255
 neighbor 10.0.0.13 remote-as 2
 neighbor 10.0.0.13 next-hop-self
 neighbor 10.0.0.17 remote-as 3
 neighbor 10.0.0.17 next-hop-self
 neighbor 10.0.0.22 remote-as 5
 neighbor 10.0.0.22 next-hop-self
 maximum-paths 4
R4#

Please note, the “maximum-paths n” command is mandatory, the value should be at least 2 in this case, otherwise BGP will choose only one path.

Now we can see the little “m” for multipath into the BGP table:

R4#show ip bgp
(...)
     Network          Next Hop            Metric LocPrf Weight Path
 *>  0.0.0.0          10.0.0.13                              0 2 i
 *m                   10.0.0.17                              0 3 i
 *>  1.1.1.1/32       10.0.0.13                              0 2 1 i
 *m                   10.0.0.17                              0 3 1 i
 *>  2.2.2.2/32       10.0.0.13                0             0 2 i
 *                    10.0.0.17                              0 3 1 2 i
 *   3.3.3.3/32       10.0.0.13                              0 2 1 3 i
 *>                   10.0.0.17                0             0 3 i
 *>  4.4.4.4/32       0.0.0.0                  0         32768 i
 *>  5.5.5.5/32       10.0.0.22                0             0 5 i
R4#
R4#

You can see for the default and 1.1.1.1/32 prefixes we have multipath.

Then, you can also see that AS-1 is doing transit between AS-2 and AS-3 in both directions (prefixes 2.2.2.2/32 and 3.3.3.3/32) and in that cast the shortest as-path is still the best path, without multipath.

In details for 0.0.0.0:

R4#show ip bgp 0.0.0.0
BGP routing table entry for 0.0.0.0/0, version 2
Paths: (2 available, best #1, table default)
Multipath: eBGP
  Advertised to update-groups:
     3         
  Refresh Epoch 1
  2
    10.0.0.13 from 10.0.0.13 (2.2.2.2)
      Origin IGP, localpref 100, valid, external, multipath, best
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  3
    10.0.0.17 from 10.0.0.17 (3.3.3.3)
      Origin IGP, localpref 100, valid, external, multipath(oldest)
      rx pathid: 0, tx pathid: 0
R4#

In the routing table of R4, we can see a real load-sharing for 0.0.0.0/0:

R4#show ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "bgp 4", distance 20, metric 0, candidate default path
  Tag 2, type external
  Last update from 10.0.0.17 00:10:36 ago
  Routing Descriptor Blocks:
    10.0.0.17, from 10.0.0.17, 00:10:36 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 2
      MPLS label: none
  * 10.0.0.13, from 10.0.0.13, 00:10:36 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 2
      MPLS label: none

bgp bestpath as-path ignore

You cannot have the same result with the command ” bgp bestpath as-path ignore” because this command skip the as-path length comparison yes, but you still receive the prefix from two different AS and this is not multipath.

Furthermore, if you try to enable “bgp bestpath as-path ignore” in conjunction with “bgp bestpath multipath-relax as-path” you have this nice message from Cisco IOS:

R4(config)#router bgp 4
R4(config-router)# bgp bestpath as-path ignore
% Cannot be used in conjunction with 'bgp bestpath multipath-relax as-path'
R4(config-router)#

In conclusion, bgp multipath works only for the default-route received by your upstream and with prefixes with the exact same as-path length.

BGP unequal cost load sharing

Now let’s go a little further and do unequal cost load sharing for outgoing traffic from AS-4. I will use the same topology as before but with different uplinks bandwidths:

Now we have 20Mbps between AS-4 and AS-2 and 40Mbps between AS-4 and AS3.

For this, on R4 I have first to configure the real bandwidth on the physical interfaces to R2 and R3:

!
interface GigabitEthernet0/1
 description to R2
 bandwidth 20000
 ip address 10.0.0.14 255.255.255.252
 duplex full
 speed auto
 media-type rj45
!
interface GigabitEthernet0/3
 description to R3
 bandwidth 40000
 ip address 10.0.0.18 255.255.255.252
 duplex full
 speed auto
 media-type rj45
!

And then, I need to configure BGP dmzlink-bw globally and for each uplink:

router bgp 4
 bgp log-neighbor-changes
 bgp bestpath as-path multipath-relax
 bgp dmzlink-bw
 network 4.4.4.4 mask 255.255.255.255
 neighbor 10.0.0.13 remote-as 2
 neighbor 10.0.0.13 next-hop-self
 neighbor 10.0.0.13 dmzlink-bw
 neighbor 10.0.0.17 remote-as 3
 neighbor 10.0.0.17 next-hop-self
 neighbor 10.0.0.17 dmzlink-bw
 neighbor 10.0.0.22 remote-as 5
 neighbor 10.0.0.22 next-hop-self
 maximum-paths 4
R4#

Now, let’s look at the routing table for 0.0.0.0/0:

R4#show ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "bgp 4", distance 20, metric 0, candidate default path
  Tag 2, type external
  Last update from 10.0.0.17 00:03:14 ago
  Routing Descriptor Blocks:
    10.0.0.17, from 10.0.0.17, 00:03:14 ago
      Route metric is 0, traffic share count is 2
      AS Hops 1
      Route tag 2
      MPLS label: none
  * 10.0.0.13, from 10.0.0.13, 00:03:14 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 2
      MPLS label: none
R4#

Now you can see we have a share count 2:1 between the two interfaces, based on the bandwidth configured.

Leave a Reply

Your email address will not be published. Required fields are marked *