If you operate a data-center network with Cisco Nexus, you’ve probably already faced the problem of how to perform a maintenance on one of the two switches of a vPC pair, with minimum impact and risks for the production network. Cisco NX-OS contains a feature called “Graceful Insertion and Removal” or GIR to help you for that. Here is how it works.
Scenario
Let’s take the example below:
(click on the image to see a larger version)
We have two Nexus (in nx-os mode) in vPC. Doing layer-2 aggregation and layer-3 routing. They use OSPF as iGP and BGP towards the upstream provider(s). Here, I put only one external router for simplicity. To simulate the Internet, this router announces the prefix 8.8.8.8/32 to our vPC AS.
On these vPC “core” routers, we have the layer-3 SVIs for the data-center networks, represented here with a single dual-homed Nexus access/leaf switch, on the network 10.10.10.0/24.
Despite everything we can read about ACI or VXLAN, this is still a very common architecture in the real world today, at least for small and mid-size data-centers.
Now, let’s say we have to perform a hardware maintenance on the switch core-sw-02. And we would like to do this with a minimum impact on the production network. Even no impact at all, if possible. At the end this is one of the main reason why we have two physically separated switch, right?
The manual solution
Let’s see how we can do this task with manual configuration:
- First, we have to be sure the Internet traffic go through core-sw-01 and not anymore through core-sw-02. The best solution for this is to play with BGP metrics to give a better preference to the core-sw-01 path. One solution could be do add an as-path-prepend on the announces of core-sw-02 to the external router, this to influence the incoming traffic. Or we could also play with MED values if we have a unique upstream provider. Or, we can also add special communities to our prefixes, if the upstream provider supports it. Then, for outgoing traffic, we have to assign a lower local-pref value on the routes we receive on core-sw-02. Or, as we use Cisco devices here, we can also use the weight argument.
If we choose to simply shutdown the BGP session on core-sw-02, there is a risk the router on the other side do not interpret correctly the BGP message with the “cease” error code (normally sent with we manually shutdown a session), or a similar problem. So, there is a risk of losing packets, at least during the BGP hold-time (180 seconds on Cisco by default).
I would avoid doing a shutdown of the session or the physical interface, because there are too many unknown parameters on the provider side. - Then, we have to be sure the internal traffic is bypassing the core-sw-02. This involves knowing how vPC and the port-channel of the access/leafs switch(s) are forwarding traffic. To minimize the impact, we have to make a vPC shutdown on core-sw-02. We could also maybe change the load-sharing method on the access/leafs switches, but the side effects will certainly be greater than we would like to avoid. So we will only do a vPC shutdown.
- And 3rd problem, it depends also on the iGP. If, for example, we have another layer-3 device dual-homed to the core switches. This is not the case on my example above but it could be possible. For this, we have to be sure all the layer-3 devices on the network have a better metric through core-sw-01 then through core-sw-02. This can be done in different ways, depending on the iGP protocol used.
- Finally, we can make the maintenance.
- Last but not least, we have to do the same operations in reverse order to put the switch back into service.
Graceful Insertion and Removal (GIR) overview
As an effort to automate this process, Cisco nx-os introduces the Graceful Insertion and Removal (GIR) function. Available since software release 7.0(3)I2(1), on the Nexus 9K, 7K and 3K platforms. This allows to make the entire process above with a single command. And also to customize it with our own commands, if necessary.
By using the command: system mode maintenance in configuration mode, nx-os will put the switch into maintenance mode by configuring what we call the maintenance configuration profile. Then, by doing no system mode maintenance nx-os will put the switch back in service by configuring the normal configuration profile.
GIR profiles
There are two types of maintenance configuration profiles:
- The maintenance-mode profile: containing all the commands that will be executed during the GIR activation or graceful removal, when the switch enters maintenance mode.
- The normal-mode profile: containing all the commands that will be executed during the GIR deactivation or graceful insertion, when the switch returns to normal mode.
We can use the default profiles, or modify them.
The default profile is generated by the switch by parsing the configuration when we type the command “system mode maintenance” for the first time, the profile will be different depending what routing protocols are in use. The system generate a default maintenance-mode profile and a default normal-mode profile.
We can see the maintenance profiles with the command: show maintenance profile
Below, I executed this command on core-sw-01 but the command was never used, so we can see the profiles are empty:
NX01# show maintenance profile [Normal Mode] [Maintenance Mode] NX01#
With the default maintenance profile, the active forwarding protocols of the switch are placed in “isolate” state. Here is an example when I type this command on core-sw-02 and abort it:
NX02(config)# system mode maintenance Following configuration will be applied: router bgp 1 isolate router ospf 1 isolate sleep instance 2 20 vpc domain 99 shutdown NOTE: If you have vPC orphan interfaces, please ensure 'vpc orphan-port suspend' is configured under them, before proceeding further Do you want to continue (yes/no)? [no] no
As we can see, the system first put BGP in isolate mode, then OSPF, then wait 20 seconds and finally make a vPC shutdown. This correspond to the manual configuration I suggested above.
Routing protocols isolate mode
The isolate mode is used to switch from the active forwarding path. Each protocol use a different mechanism to influence the forwarding decision of the remaining devices to not choose this switch as part of the active path(s):
- RIP: poison route(s) with highest metric.
- OSPF: send OSPF LSAs with max metric.
- EIGRP: poison route(s) with highest metric.
- IS-IS: refresh LSPs with Overload bit on.
- BGP: withdraw BGP route(s) advertisements.
- PIM (in vPC): vPC forwarding role transfer.
- vPC: shutdown the vPC to bring down the vPC domain on the local switch.
GIR in action
Now, we can execute this command on core-sw-02 to see the results:
NX02(config)# system mode maintenance Following configuration will be applied: router bgp 1 isolate router ospf 1 isolate sleep instance 2 20 vpc domain 99 shutdown NOTE: If you have vPC orphan interfaces, please ensure 'vpc orphan-port suspend' is configured under them, before proceeding further Do you want to continue (yes/no)? [no] yes Generating before_maintenance snapshot before going into maintenance mode Starting to apply commands... Applying : router bgp 1 Applying : isolate Applying : router ospf 1 Applying : isolate Applying : sleep instance 2 20 Applying : vpc domain 99 Applying : shutdown Maintenance mode operation successful. Waiting 120 seconds to allow network re-routing to occur before releasing CLI ........................done NX02(maint-mode)(config)#
At that moment, we are in maintenance mode. We can power-off the switch or hot-swap line cards without any impact on the productive network. All the BGP sessions and OSPF adjacency are still up, but no route is sent to the external router via BGP, for example:
router# show bgp all sum
BGP summary information for VRF default, address family IPv4 Unicast
BGP router identifier 8.8.8.8, local AS number 100
BGP table version is 5, IPv4 Unicast config peers 2, capable peers 2
2 network entries and 2 paths using 472 bytes of memory
BGP attribute entries [2/320], BGP AS path entries [1/6]
BGP community entries [0/0], BGP clusterlist entries [0/0]
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.16.1.1 4 1 8 7 5 0 0 00:01:55 1
172.16.2.1 4 1 7 7 5 0 0 00:01:54 0
Now, let’s see when we execute the “graceful insertion”:
NX02(maint-mode)(config)# no system mode maintenance Following configuration will be applied: vpc domain 99 no shutdown sleep instance 2 20 router ospf 1 no isolate router bgp 1 no isolate Do you want to continue (yes/no)? [no] yes Starting to apply commands... Applying : vpc domain 99 Applying : no shutdown Applying : sleep instance 2 20 Applying : router ospf 1 Applying : no isolate Applying : router bgp 1 Applying : no isolate Maintenance mode operation successful. Waiting 120 seconds to allow network convergence before generating after_maintenance snapshot ......................... Generating after_maintenance snapshot Please use 'show snapshots compare before_maintenance after_maintenance' to check the health of the system NX02(config)#
Again, we can see the system enter the commands in the reverse order, as suggested on the manual configuration above.
GIR custom profiles
Now, let’s see who to change the maintenance-mode and normal-mode profiles.
As we saw above, we can see the current profiles, generated by the system, with the command:
NX02# show maintenance profile [Normal Mode] vpc domain 99 no shutdown sleep instance 2 20 router ospf 1 no isolate router bgp 1 no isolate [Maintenance Mode] router bgp 1 isolate router ospf 1 isolate sleep instance 2 20 vpc domain 99 shutdown
If we need to change or add something, here is the process:
- Go in configuration mode and type: system mode maintenance always-use-custom-profile
Like this, the system will not generate a new default profile, it will use the one you defined. - Update the profiles as you need, with the commands: config maintenance profile maintenance-mode | normal-mode
Let’s see an example below:
NX02# config maintenance profile maintenance-mode Please configure 'system mode maintenance always-use-custom-profile' if you want to use custom profile always for maintenance mode. Enter configuration commands, one per line. End with CNTL/Z. NX02(config-mm-profile)# NX02(config-mm-profile)# interface e1/1 NX02(config-mm-profile-if-verify)# shut NX02(config-mm-profile-if-verify)# exit !--- Here we can see the changes with show maint profile: NX02(config-mm-profile)# show maint profile [Normal Mode] vpc domain 99 no shutdown sleep instance 2 20 router ospf 1 no isolate router bgp 1 no isolate [Maintenance Mode] router bgp 1 isolate router ospf 1 isolate sleep instance 2 20 vpc domain 99 shutdown interface Ethernet1/1 shutdown NX02(config-mm-profile)# NX02(config-mm-profile)# end Exit maintenance profile mode. NX02#
Any command is accepted on the maintenance profiles. We can also insert a delay before the next change (sleep instance x sec), or execute a Python script (python instance instance-number uri [python-arguments]). But, remember to add the “no” commands into the normal-mode profile to restore your change.
To delete the profiles, you can use the command:
NX02# no configure maintenance profile maintenance-mode Maintenance mode profile maintenance-mode successfully deleted Enter configuration commands, one per line. End with CNTL/Z. Exit maintenance profile mode. NX02# NX02# no configure maintenance profile normal-mode Maintenance mode profile normal-mode successfully deleted Enter configuration commands, one per line. End with CNTL/Z. Exit maintenance profile mode. NX02# NX02# show maintenance profile [Normal Mode] [Maintenance Mode] NX02# NX02# config t Enter configuration commands, one per line. End with CNTL/Z. NX02(config)# no system mode maintenance always-use-custom-profile NX02(config)#
Snapshots
GIR automatically create a snapshot before and after the maintenance. The snapshot is capturing the running state of selected features and store them on the persistent storage media. This is useful to compare the state of the switch before graceful removal and after graceful insertion.
By entering show snapshots, we see the list of snapshots:
NX02# show snapshots Snapshot Name Time Description ------------------------------------------------------------------------------ before_maintenance Fri Jan 10 18:33:37 2020 system-internal-snapshot after_maintenance Fri Jan 10 18:58:42 2020 system-internal-snapshot
Compare snapshots
We can quickly compare the snapshots with the compare summary command:
NX02# show snapshots compare before_maintenance after_maintenance summary ================================================================================ Feature before_maintenanceafter_maintenance changed ================================================================================ basic summary # of interfaces 133 133 # of vlans 2 2 # of ipv4 routes vrf default 20 19 * # of ipv4 paths vrf default 22 21 * # of ipv4 routes vrf keepalive 8 8 # of ipv4 paths vrf keepalive 8 8 # of ipv6 routes vrf default 3 3 # of ipv6 paths vrf default 3 3 # of ipv6 routes vrf keepalive 3 3 # of ipv6 paths vrf keepalive 3 3 interfaces # of eth interfaces 128 128 # of eth interfaces up 7 7 # of eth interfaces down 121 121 # of eth interfaces other 0 0 # of vlan interfaces 2 2 # of vlan interfaces up 1 0 * # of vlan interfaces down 1 2 * # of vlan interfaces other 0 0 NX02#
Custom sections
You can add any section to the snapshots. Any command starting with show can be added to it.
For this, use the command: snapshot section add section “show-command” row-id element-key1 [element-key2]
Where, row-id is the tag of each row entry of the show command’s XML output. And the element_key1 and 2 are the row entries. In most cases only one element needs to be specified.
Example, if we want to add a custom section to see the IPv4 prefixes information:
- First, execute the command you want to analyze with the XML output (here is a extract of the result):
NX02# show ip route detail vrf all | xml <?xml version="1.0" encoding="ISO-8859-1"?> <nf:rpc-reply xmlns="http://www.cisco.com/nxos:1.0:urib" xmlns:nf="urn:ietf:para ms:xml:ns:netconf:base:1.0"> <nf:data> <show> <ip> <route> <__XML__OPT_Cmd_urib_show_ip_route_command_ip> <__XML__OPT_Cmd_urib_show_ip_route_command_unicast> <__XML__OPT_Cmd_urib_show_ip_route_command_topology> <__XML__OPT_Cmd_urib_show_ip_route_command_l3vm-info> <__XML__OPT_Cmd_urib_show_ip_route_command_rpf> <__XML__OPT_Cmd_urib_show_ip_route_command_ip-addr> <__XML__OPT_Cmd_urib_show_ip_route_command_protocol> <__XML__OPT_Cmd_urib_show_ip_route_command_summary> <__XML__OPT_Cmd_urib_show_ip_route_command_vrf> <__XML__OPT_Cmd_urib_show_ip_route_command___readonly__> <__readonly__> <TABLE_vrf> <ROW_vrf> <vrf-name-out>default</vrf-name-out> <TABLE_addrf> <ROW_addrf> <addrf>ipv4</addrf> <TABLE_prefix> <ROW_prefix> <ipprefix>0.0.0.0/32</ipprefix> <ucast-nhops>1</ucast-nhops> <mcast-nhops>0</mcast-nhops> <attached>false</attached> <TABLE_path> <ROW_path> <ifname>Null0</ifname> <uptime>PT1H24M38S</uptime> <pref>220</pref> <metric>0</metric> <clientname>broadcast</clientname> <type>discard</type> <ubest>true</ubest> </ROW_path> </TABLE_path> </ROW_prefix> <ROW_prefix> <ipprefix>127.0.0.0/8</ipprefix> <ucast-nhops>1</ucast-nhops> <mcast-nhops>0</mcast-nhops> <attached>false</attached> <TABLE_path> <ROW_path> <ifname>Null0</ifname> <uptime>PT1H24M38S</uptime> <pref>220</pref> <metric>0</metric> <clientname>broadcast</clientname> <type>discard</type> <ubest>true</ubest> </ROW_path> </TABLE_path> </ROW_prefix> <ROW_prefix> <ipprefix>255.255.255.255/32</ipprefix> <ucast-nhops>1</ucast-nhops> <mcast-nhops>0</mcast-nhops> <attached>false</attached>
- We can see the KEY element for each prefix is: <ROW_prefix>
- And the route itself is: <ipprefix>
So, to add a “route” section into the snapshot, we have to enter the command:
NX02# snapshot section add route "show ip route detail vrf all" ROW_prefix ipprefix added section "route"
To see the custom sections of the snapshot with the command:
NX02# show snapshots sections user-specified snapshot sections -------------------------------- [route] show command: show ip route detail vrf all row id: ROW_prefix key1: ipprefix key2: - NX02#
Custom section demo
Now, let’s create two snapshots including this custom section and compare them:
NX02# snapshot create TEST1 my test Executing 'show interface'... Done Executing 'show ip route summary vrf all'... Done Executing 'show ipv6 route summary vrf all'... Done Executing 'show bgp sessions vrf all'... Done Feature 'eigrp' not enabled, skipping... Feature 'eigrp' not enabled, skipping... Executing 'show vpc'... Done Executing 'show ip ospf vrf all'... Done Feature 'ospfv3' not enabled, skipping... Feature 'isis' not enabled, skipping... Feature 'rip' not enabled, skipping... Executing user-specified 'show ip route detail vrf all'... Done Snapshot 'TEST1' created NX02# --- Now I add four static routes to see a difference --- NX02# config t Enter configuration commands, one per line. End with CNTL/Z. NX02(config)# ip route 9.9.9.9 255.255.255.255 172.16.2.2 NX02(config)# ip route 9.9.9.10 255.255.255.255 172.16.2.2 NX02(config)# ip route 9.9.9.11 255.255.255.255 172.16.2.2 NX02(config)# ip route 9.9.9.12 255.255.255.255 172.16.2.2 NX02(config)# end --- And now I make the 2nd snapshot --- NX02# snapshot create TEST2 my test Executing 'show interface'... Done Executing 'show ip route summary vrf all'... Done Executing 'show ipv6 route summary vrf all'... Done Executing 'show bgp sessions vrf all'... Done Feature 'eigrp' not enabled, skipping... Feature 'eigrp' not enabled, skipping... Executing 'show vpc'... Done Executing 'show ip ospf vrf all'... Done Feature 'ospfv3' not enabled, skipping... Feature 'isis' not enabled, skipping... Feature 'rip' not enabled, skipping... Executing user-specified 'show ip route detail vrf all'... Done Snapshot 'TEST2' created NX02#
Now we can compare them:
NX02# show snapshots compare TEST1 TEST2 ipv4routes ================================================================================ metric TEST1 TEST2 changed ================================================================================ # of ipv4 routes 27 31 * Prefix -------------------- 9.9.9.9/32 prefix not in TEST1 9.9.9.10/32 prefix not in TEST1 9.9.9.11/32 prefix not in TEST1 9.9.9.12/32 prefix not in TEST1 NX02#
Delete snapshots
To delete all snapshots:
NX02# snapshot delete ALL All snapshots are successfully deleted NX02#
More resources
Cisco Nexus 7K Series NX-OS Configuration Guide, Release 8.x – GIR Chapter
Cisco Nexus 9K Series NX-OS Configuration Guide, Release 7.x – GIR Chapter
Cisco Nexus 9000 Series GIR white paper (the cases studies are great)
Cisco-Live Data center Operations and Maintenance Best Practices (BRKDCT-2458)
Hi Jerome,
Interesting Content! I Have simulated in simulator and real device, while we into Maintenance Mode in local device (vpc domain – shutdown), why the peer switch impacted and the experience downtime on the service?
Hi Riyan,
Thank you for your comment.
Well, it could be due to several causes, I don’t know the context.
On my side, I’ve used the maintenance mode several times in production and it was really transparent for directly attached devices in LACP/PortChannel to the vPC.
Best,
Jerome
Hi Jerome, so I did a bit more digging and found the following command:
“system mode maintenance on-reload reset-reason MAINTENANCE”
This ensures that if it is in maint mode before being reset, it will come up in maint mode, there are various other options fore ‘reset-reason’ but clearly this is the one we wanted!
This seemed to do the trick and all went well with the move of 2 x 7k chassis.
Hi James,
Thanks a lot for your comment!
I just checked on my lab device, yes this is the option you are looking for:
(config)# system mode maintenance ?
always-use-custom-profile Always use custom profile when entering maintenance mode
dont-generate-profile Do not generate the maintenance/normal-mode profile
maint-delay Delay to allow protocol reroute before releasing CLI
non-interactive Do operation non interactively in background
on-reload On reload maintenance mode configuration
shutdown Issue shutdown instead of isolate (default)
snapshot-delay Delay after which after_maintenance snapshot will be taken
timeout Restart maintenance mode timer with a new value
Thank you for sharing this.
Best Regards,
Jerome
Hi, firstly thanks for the explanation. Do you know if the maintenance mode survives a reboot?
So if we enable it on a switch, power it off, move it and the power back on will it still be in maintenance mode so we can gracefully re-insert.
The above mentions you can power off without impact, but when you power back on what mode should you expect?
Hi James,
Thank you for your comment and question. That’s a very good question. I never tried to bring up a switch with system mode maintenance on after a reboot. If I had to power off the switch, I made a system mode maintenance, and once every change were done, I did the power off. Without saving the config.
But I think you can save the configuration with the system mode maintenance on state, yes.
So, do the system mode maintenance command, wait for all changes are done, do a write mem, and then you can power off the switch.
When you power on the switch again, my guess is the config will be saved with the vpc shutdown, the routing protocols in isolate state, and so on.
And then, after all links and routing protocol neighbors are up, you can do a “no system mode maintenance”.
Please, add a comment here with your results if you tried.
Thank you,
Jerome
Great content – much better than the Cisco doc on the same topic.
Does isolate not affect redistributed routes? Not sure if I’m hitting a bug on a N9K – 9.3.5.
When I isolate the BGP process, the routes being redistributed from EIGRP stay in the table and show as advertised prefixes on the neighbor.
Hi Richard,
Thank you for your comment.
I never tried with BGP routes redistributed into EIGRP, so I cannot answer. But my guess is all BGP received routes should be removed, so not redistributed too.
Best,
Jerome
thank’s heaps – very helpful
Thank you, Mark.