The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes how to troubleshoot unexpected reloads due to stack issues on Catalyst 9000 switches.
Cisco recommends that you have knowledge of these topics.
The information in this document is based on these software and hardware versions:
This document can also be used with these hardware and software versions:
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
The stack reset reasons are described in this table.
Reset Reason | Description |
---|---|
Stack merge |
This is observed when at least two stack members claim to be the active switch of the stack. This can be seen when the stack ring is broken or when Stack Discovery Protocol (SDP) messages are lost due to bad stack cables. |
Stack merge due to incompatibility |
Same as stack merge. Seen more frequently in half-ring stack configurations. |
Lost both active and standby |
When the active switch is lost and if for any reason the standby switch is unable to assume the active role, then all other stack members are reloaded and use this reset reason. This can also be seen when stacks are configured in half-ring configurations. |
Stack cable authentication failure |
Usually seen due to a faulty stack cable or stack port. It could also be seen due to a software issue. |
Stack adapter authentication failure |
Usually seen due to a faulty stack cable, stack adapter, or stack port. It could also be seen due to a software issue. |
Validate the last reload reason for all members of the stack.
show version
show switch show logging onboard switch <switch number> uptime detail
In the show version
command output, you can identify the different reset reasons for each of the stack members.
switch#show version <omitted output> Last reload reason: stack merge <-- Switch 1 Reason
<omitted output> Switch Ports Model SW Version SW Image Mode ------ ----- ----- ---------- ---------- ---- * 1 53 C9300-48P 17.3.5 CAT9K_IOSXE INSTALL 2 53 C9300-48P 17.3.5 CAT9K_IOSXE INSTALL 3 53 C9300-48P 17.3.5 CAT9K_IOSXE INSTALL Switch 02 --------- Switch uptime : 13 hours, 47 minutes Base Ethernet MAC Address : aa:aa:aa:aa:aa:aa Motherboard Assembly Number : 11-11111-11 Motherboard Serial Number : AAAAAAAAAAA Model Revision Number : F0 Motherboard Revision Number : C0 Model Number : C9300-48P System Serial Number : AAAAAAAAAAB Last reload reason : stack merge due to incompatiblity <-- Switch 2 Reason Switch 03 --------- Switch uptime : 50 minutes Base Ethernet MAC Address : bb:bb:bb:bb:bb:bb Motherboard Assembly Number : 22-22222-22 Motherboard Serial Number : BBBBBBBBBBA Model Revision Number : E0 Motherboard Revision Number : C0 Model Number : C9300L-48P System Serial Number : BBBBBBBBBBB Last reload reason : lost both active and standby <-- Switch 3 Reason
The show switch
command output displays the current role of the stack members.
switch#show switch Switch/Stack Mac Address : xxxx.xxxx.xxxx - Local Mac Address Mac persistency wait time: Indefinite H/W Current Switch# Role Mac Address Priority Version State ------------------------------------------------------------------------------------- *1 Active xxxx.xxxx.xxxx 15 V01 Ready 2 Standby aaaa.aaaa.aaaa 14 V01 Ready 3 Member bbbb.bbbb.bbbb 13 V01 Ready
The last reload reason record can be seen with the next command.
switch#show logging onboard switch 1 uptime detail -------------------------------------------------------------------------------- UPTIME SUMMARY INFORMATION -------------------------------------------------------------------------------- First customer power on : 11/15/2019 22:46:33 Total uptime : 0 years 0 weeks 6 days 20 hours 15 minutes Total downtime : 0 years 46 weeks 5 days 23 hours 42 minutes Number of resets : 10 Number of slot changes : 0 Current reset reason : stack merge <-- Current reset timestamp : 10/15/2020 05:44:01 <-- Current slot : 1 Chassis type : 95 Current uptime : 0 years 0 weeks 0 days 13 hours 0 minutes -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- UPTIME CONTINUOUS INFORMATION -------------------------------------------------------------------------------- Time Stamp | Reset | Uptime MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes -------------------------------------------------------------------------------- <omitted output> 10/15/2020 05:44:01 stack merge 0 0 0 1 0 <-- -------------------------------------------------------------------------------- switch#show logging onboard switch 2 uptime detail -------------------------------------------------------------------------------- UPTIME SUMMARY INFORMATION -------------------------------------------------------------------------------- First customer power on : 11/21/2019 17:46:08 Total uptime : 0 years 0 weeks 6 days 23 hours 21 minutes Total downtime : 0 years 46 weeks 0 days 1 hours 36 minutes Number of resets : 14 Number of slot changes : 1 Current reset reason : stack merge due to incompatiblity <-- Current reset timestamp : 10/15/2020 05:44:03 Current slot : 2 Chassis type : 95 Current uptime : 0 years 0 weeks 0 days 13 hours 0 minutes -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- UPTIME CONTINUOUS INFORMATION -------------------------------------------------------------------------------- Time Stamp | Reset | Uptime MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes -------------------------------------------------------------------------------- <omitted output> 10/15/2020 05:44:03 stack merge due to incompatiblity 0 0 0 1 0 <-- -------------------------------------------------------------------------------- switch#show logging onboard switch 3 uptime detail -------------------------------------------------------------------------------- UPTIME SUMMARY INFORMATION -------------------------------------------------------------------------------- First customer power on : 08/13/2019 23:46:07 Total uptime : 0 years 38 weeks 5 days 11 hours 54 minutes Total downtime : 0 years 22 weeks 3 days 7 hours 45 minutes Number of resets : 37 Number of slot changes : 3 Current reset reason : lost both active and standby <-- Current reset timestamp : 10/15/2020 18:56:09 Current slot : 3 Chassis type : 95 Current uptime : 0 years 0 weeks 0 days 0 hours 30 minutes -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- UPTIME CONTINUOUS INFORMATION -------------------------------------------------------------------------------- Time Stamp | Reset | Uptime MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes -------------------------------------------------------------------------------- <omitted output> 10/15/2020 18:56:09 lost both active and standby 0 0 0 0 35 <-- --------------------------------------------------------------------------------
Note: The errors "stack cable authentication failure" and "stack adapter authentication failure" usually do not allow the affected switch to fully boot up. Therefore, no commands can be collected for further analysis. Check the corresponding section with the steps listed.
Based on the hardware installation guide for Catalyst 9200 and 9300 switches, you must ensure the stack complies with the stack cable set up and ensure stack cables are properly set.
Stack cables must be be done in this manner:
...
This way the stack set up looks like these images.
Catalyst 9200L and 9200
Catalyst 9300
When you insert the stack adapter and/or the stack cable, use these instructions:
Catalyst 9200L and 9200
1. Ensure stack adapters are properly inserted. The Cisco logo must be on top.
2. Ensure the stack cable is firmly tightened by hand.
Catalyst 9300L
1. Ensure the stack adapters are properly inserted. The Cisco logo must be on top.
2. Ensure the stack cable is firmly tightened by hand.
Catalyst 9300
In most cases, the unexpected reloads shown in this document were triggered due to bad stack cables, stack adapters, or stack ports. Regardless which software version you run, you can be susceptible to this if the stack parts were not installed properly.
Once you validated the Confirm Stack Cable Setup and Install Stack Cables sections, check the stack cable health with these commands:
show switch neighbors
show switch stack-ring speed
show switch stack-ports summary
show switch stack-ports detail
In this example, there is a stack of three Catalyst 9300 switches. The show switch neighbors
command output displays which switches are connected to each stack member:
switch#show switch neighbors Switch # Port 1 Port 2 -------- ------ ------ 1 2 3 2 3 1 3 1 2
When a stack cable is not present, wrongly inserted, or is faulty, None is shown instead of the stack member:
switch#show switch neighbors Switch # Port 1 Port 2 -------- ------ ------ 1 2 None <-- 2 3 1 3 None 2 <--
The show switch stack-ring speed
command provides you the stack ring status:
switch#show switch stack-ring speed
Stack Ring Speed : 480G <--
Stack Ring Configuration: Full <--
Stack Ring Protocol : StackWise
If for any reason the stack ring is broken, the output looks like this:
switch#show switch stack-ring speed
Stack Ring Speed : 240G <--
Stack Ring Configuration: Half <--
Stack Ring Protocol : StackWise
Warning: It is never expected to see Half status in a healthy Stack Ring Configuration. Though the stack works, it loses half of the bandwidth as well as redundancy.
A healthy show switch stack-ports summary
command output looks like this.
Note: Switch 1 stack port 1 shows two link changes. This is normal.
switch#show switch stack-ports summary Sw#/Port# Port Status Neighbor Cable Length Link OK Link Active Sync OK #Changes to LinkOK In Loopback ------------------------------------------------------------------------------------------------------------------- 1/1 OK 2 50cm Yes Yes Yes 2 No 1/2 OK 3 100cm Yes Yes Yes 1 No 2/1 OK 3 50cm Yes Yes Yes 1 No 2/2 OK 1 50cm Yes Yes Yes 1 No 3/1 OK 1 100cm Yes Yes Yes 1 No 3/2 OK 2 50cm Yes Yes Yes 1 No
If the output shows many flaps on certain ports, it could be a signal of stack instability. This condition could trigger a stack merge. The Unknown
status can be seen if the stack is not properly cabled.
switch#show switch stack-ports summary Sw#/Port# Port Status Neighbor Cable Length Link OK Link Active Sync OK #Changes to LinkOK In Loopback ------------------------------------------------------------------------------------------------------------------- 1/1 OK 2 50cm Yes Yes Yes 16 No
<-- 16 flaps on switch 1 stack port 1 facing switch 2 1/2 OK 3 100cm Yes Yes Yes 1 No 2/1 OK 3 50cm Yes Yes Yes 1 No 2/2 OK 1 Unknown Yes Yes Yes 16 No
<-- Cable length 'unknown', 16 flaps on switch 2 stack port 2 facing switch 1 3/1 OK 1 100cm Yes Yes Yes 1 No 3/2 OK 2 50cm Yes Yes Yes 1 No
When excessive link changes are seen, the next step is to check the show switch stack-ports detail
command and focus on the CRC Errors
counters. CRCs that increment on an interface means the packets received on that port are malformed. These conditions can apply:
switch#show switch stack-ports detail 1 is OK Loopback No Cable Length 100cm Neighbor 2 Link Ok Yes Sync Ok Yes Link Active Yes Changes to LinkOK 16 Five minute input rate 1110 bytes/sec Five minute output rate 47 bytes/sec 24798951 bytes input 737941 bytes output CRC Errors Data CRC 459731 <-- CRCs Ringword CRC 35156 <-- CRCs InvRingWord 54951 <-- CRCs PcsCodeWord 35481 <-- CRCs 1/2 is OK Loopback No Cable Length 100cm Neighbor 3 Link Ok Yes Sync Ok Yes Link Active Yes Changes to LinkOK 1 Five minute input rate 164 bytes/sec Five minute output rate 67 bytes/sec 0 bytes input 0 bytes output CRC Errors Data CRC 0 Ringword CRC 0 InvRingWord 0 PcsCodeWord 0 2/1 is OK Loopback No Cable Length 50cm Neighbor 3 Link Ok Yes Sync Ok Yes Link Active Yes Changes to LinkOK 0 Five minute input rate 0 bytes/sec Five minute output rate 0 bytes/sec 0 bytes input 0 bytes output CRC Errors Data CRC 0 Ringword CRC 0 InvRingWord 0 PcsCodeWord 0 2/2 is OK Loopback No Cable Length 50cm Neighbor 1 Link Ok Yes Sync Ok Yes Link Active Yes Changes to LinkOK 16 Five minute input rate 30 bytes/sec Five minute output rate 1093 bytes/sec 480028 bytes input 0 bytes output CRC Errors Data CRC 0 <-- No CRCs Ringword CRC 0 <-- No CRCs InvRingWord 0 <-- No CRCs PcsCodeWord 0 <-- No CRCs 3/1 is OK Loopback No Cable Length 100cm Neighbor 1 Link Ok Yes Sync Ok Yes Link Active Yes Changes to LinkOK 1 Five minute input rate 0 bytes/sec Five minute output rate 0 bytes/sec 81387545 bytes input 29294666 bytes output CRC Errors Data CRC 0 Ringword CRC 0 InvRingWord 0 PcsCodeWord 0 3/2 is OK Loopback No Cable Length 100cm Neighbor 2 Link Ok Yes Sync Ok Yes Link Active Yes Changes to LinkOK 1 Five minute input rate 1030 bytes/sec Five minute output rate 0 bytes/sec 480028 bytes input 0 bytes output CRC Errors Data CRC 0 Ringword CRC 0 InvRingWord 0 PcsCodeWord 0
Note: The show switch stack-ports detail
command is available in the Cisco IOS XE Release17.3.x train and later. In order to check the CRC Errors counters on earlier releases, use the legacy commands.
Commands that end in 0 are the CRC counters for stack port 1, commands that end in 1 are the CRC counters for stack port 2. These commands must be entered for all stack members.
show platform hardware fed switch <switch number> fwd-asic register read register-name SifRacDataCrcErrorCnt-0 show platform hardware fed switch <switch number> fwd-asic register read register-name SifRacRwCrcErrorCnt-0 show platform hardware fed switch <switch number> fwd-asic register read register-name SifRacInvalidRingWordCnt-0 show platform hardware fed switch <switch number> fwd-asic register read register-name SifRacPcsCodeWordErrorCnt-0
show platform hardware fed switch <switch number> fwd-asic register read register-name SifRacDataCrcErrorCnt-1
show platform hardware fed switch <switch number> fwd-asic register read register-name SifRacRwCrcErrorCnt-1
show platform hardware fed switch <switch number> fwd-asic register read register-name SifRacInvalidRingWordCnt-1
show platform hardware fed switch <switch number> fwd-asic register read register-name SifRacPcsCodeWordErrorCnt-1
Note: The #Changes to LinkOK counter in the show switch stack-ports summary
command output and the CRC counters in the show switch stack-ports detail
command output must be checked at least two times to validate if there is an increment on any of them. Static counters validate a stable stack link, whereas, an increment in any of these counters validates stack link instability.
These logs are seen when stack issues are present.
Aug 9 21:54:22.911: %STACKMGR-6-STACK_LINK_CHANGE: Switch 1 R0/0: stack_mgr: Stack port 1 on Switch 1 is down
Aug 9 21:54:23.011: %STACKMGR-6-STACK_LINK_CHANGE: Switch 1 R0/0: stack_mgr: Stack port 1 on Switch 1 is up
Aug 9 21:54:35.096: %STACKMGR-6-STACK_LINK_CHANGE: Switch 1 R0/0: stack_mgr: Stack port 1 on Switch 1 is down
Aug 9 21:54:35.197: %STACKMGR-6-STACK_LINK_CHANGE: Switch 1 R0/0: stack_mgr: Stack port 1 on Switch 1 is up
Aug 9 21:54:40.334: %STACKMGR-6-STACK_LINK_CHANGE: Switch 2 R0/0: stack_mgr: Stack port 2 on Switch 2 is down
Aug 9 21:54:40.434: %STACKMGR-6-STACK_LINK_CHANGE: Switch 2 R0/0: stack_mgr: Stack port 2 on Switch 2 is up
Stack port flaps in half-ring scenarios cause the stack to split and switch removal. In this scenario, there is a stack of six switches in a half ring. The stack link between switch 1 and 6 is not present and the stack link between switches 5 and 6 constantly flaps. This causes switch member 6 to be removed from the stack.
Apr 9 19:13:25.665: %STACKMGR-6-STACK_LINK_CHANGE: Switch 5 R0/0: stack_mgr: Stack port 1 on Switch 5 is up
Apr 9 19:13:42.513: %STACKMGR-4-SWITCH_REMOVED: Switch 2 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:13:42.588: %STACKMGR-4-SWITCH_REMOVED: Switch 1 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:13:42.827: %STACKMGR-4-SWITCH_REMOVED: Switch 5 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:13:42.999: %STACKMGR-4-SWITCH_REMOVED: Switch 4 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:13:43.031: %STACKMGR-4-SWITCH_REMOVED: Switch 3 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:13:47.666: %STACKMGR-6-STACK_LINK_CHANGE: Switch 5 R0/0: stack_mgr: Stack port 1 on Switch 5 is down
Apr 9 19:25:57.715: %STACKMGR-6-STACK_LINK_CHANGE: Switch 5 R0/0: stack_mgr: Stack port 1 on Switch 5 is up
Apr 9 19:26:15.817: %STACKMGR-4-SWITCH_REMOVED: Switch 2 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:26:15.946: %STACKMGR-4-SWITCH_REMOVED: Switch 1 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:26:16.290: %STACKMGR-4-SWITCH_REMOVED: Switch 5 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:26:16.450: %STACKMGR-4-SWITCH_REMOVED: Switch 3 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:26:16.457: %STACKMGR-4-SWITCH_REMOVED: Switch 4 R0/0: stack_mgr: Switch 6 has been removed from the stack.
Apr 9 19:26:21.717: %STACKMGR-6-STACK_LINK_CHANGE: Switch 5 R0/0: stack_mgr: Stack port 1 on Switch 5 is down
Apr 9 19:38:31.766: %STACKMGR-6-STACK_LINK_CHANGE: Switch 5 R0/0: stack_mgr: Stack port 1 on Switch 5 is up
High hardware interrupts are seen due to too many CRC errors seen in the stack port.
Jun 9 09:28:06.723: %SIF_MGR-1-FAULTY_CABLE: Switch 1 R0/0: sif_mgr: High hardware interrupt seen on switch 1
Jun 9 09:29:06.724: %SIF_MGR-1-FAULTY_CABLE: Switch 1 R0/0: sif_mgr: High hardware interrupt seen on switch 1
Jun 9 09:30:06.725: %SIF_MGR-1-FAULTY_CABLE: Switch 1 R0/0: sif_mgr: High hardware interrupt seen on switch 1
Jun 9 09:31:06.726: %SIF_MGR-1-FAULTY_CABLE: Switch 1 R0/0: sif_mgr: High hardware interrupt seen on switch 1
Jun 9 09:33:06.727: %SIF_MGR-1-FAULTY_CABLE: Switch 1 R0/0: sif_mgr: High hardware interrupt seen on switch 1
Jun 9 09:34:06.728: %SIF_MGR-1-FAULTY_CABLE: Switch 1 R0/0: sif_mgr: High hardware interrupt seen on switch 1
This kind of issue can prevent switch boot up, therefore show
commands are not an option.
Stack cable authentication failed is shown when the switch gets reloaded due to this issue.
Waiting for 120 seconds for other switches to boot Switch is in STRAGGLER mode, waiting for active Switch to boot Active Switch has booted up, starting discovery phase ################### *** Stack cable authentication failed for cable inserted on stack port 2 on switch 1 *** <-- Reloading chassis because cable auth failed on stack_port 0# Chassis 1 reloading, reason - stack cable authentication failed reload fp action requested rp processes exit with reload switch code Jul 5 10:43:33.520: %PMAN-3-PROCESS_NOTIFICATION: R0/0: pvp:
System report /crashinfo/system-report_local_20201015-165033-Universal.tar.gz (size: 176 KB) generated
Enter the show version
command after the reload.
switch#show version <omitted output> Last reload reason: Reload Command <-- switch 1
<omitted output> Switch 02 --------- Switch uptime : 60 minutes Base Ethernet MAC Address : aa:aa:aa:aa:aa:aa Motherboard Assembly Number : 11-11111-11 Motherboard Serial Number : AAAAAAAAAAA Model Revision Number : F0 Motherboard Revision Number : C0 Model Number : C9300-48P System Serial Number : AAAAAAAAAAB Last reload reason : Reload slot command Switch 03 --------- Switch uptime : 56 minutes Base Ethernet MAC Address : bb:bb:bb:bb:bb:bb Motherboard Assembly Number : 22-22222-22 Motherboard Serial Number : BBBBBBBBBBA Model Revision Number : E0 Motherboard Revision Number : C0 Model Number : C9300L-48P System Serial Number : BBBBBBBBBBB Last reload reason : stack cable authentication failure <--
switch#show logging onboard switch 3 uptime detail -------------------------------------------------------------------------------- UPTIME SUMMARY INFORMATION -------------------------------------------------------------------------------- First customer power on : 08/13/2019 23:46:07 Total uptime : 0 years 38 weeks 5 days 11 hours 54 minutes Total downtime : 0 years 22 weeks 3 days 7 hours 45 minutes Number of resets : 37 Number of slot changes : 3 Current reset reason : stack cable authentication failur <-- Current reset timestamp : 10/15/2020 18:56:09 Current slot : 3 Chassis type : 95 Current uptime : 0 years 0 weeks 0 days 0 hours 56 minutes -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- UPTIME CONTINUOUS INFORMATION -------------------------------------------------------------------------------- Time Stamp | Reset | Uptime MM/DD/YYYY HH:MM:SS | Reason | years weeks days hours minutes -------------------------------------------------------------------------------- 10/15/2020 18:56:09 stack cable authentication failur 0 0 0 0 35 <-- --------------------------------------------------------------------------------
Stack adapter authentication failed
looks like this when the switch gets reloaded due to this software defect.
Both links down, not waiting for other switches Switch number is X
*** Stack adapter authentication failed on stack port <1|2> on switch X *** <-- Stack Adapter Auth Fail : SIF_SERDES_CABLE_WESTBOUND
It also can look like this.
Both links down, not waiting for other switches Switch number is X
*** Stack adapter authentication failed on stack port <1|2> on switch X *** <--
Stack Adapter Auth Fail : SIF_SERDES_CABLE_EASTBOUND
Note: If stack adapter/cable authentication fail is found on the switch, the respective switch is expected to reload by itself, not the whole stack.
In order to isolate the issue either to the stack cable, stack adapter, or switch itself with the next combinations of tests, complete these steps:
Note: There is a well known bug for Last reload reason:stack cable authentication failure. Validate that you do not hit this bug in case it happens only one time and you have a Catalyst 9300L switch.
Cisco bug ID CSCvu25094 - 9300L crash due - stack cable authentication failure - reload reason only once.
Revision | Publish Date | Comments |
---|---|---|
3.0 |
07-Nov-2024 |
Updated Introduction, Links, Alt Text, Grammar and Formatting. |
1.0 |
24-Aug-2022 |
Initial Release |