简介
本文描述问题遇到与连结硬件问题引起的5010/5020交换机的女低音ASIC (错误消息%NOHMS-2-NOHMS_DIAG_ERROR :第 1 单元:运行时diag检测的主要事件:端口故障),并且提供一解决方案给问题。
Cisco建议您有连结CLI的知识。
使用的组件
本文档中的信息根据Cisco连结仅5010/5020交换机。它不影响Cisco连结5548/5596交换机。
本文档中的信息都是基于特定实验室环境中的设备编写的。本文档中使用的所有设备最初均采用原始(默认)配置。如果您使用的是真实网络,请确保您已经了解所有命令的潜在影响。
在卡2的多个接口发生故障,并且您看到此警报:
N5020 %$ VDC-1 %$ %NOHMS-2-NOHMS_DIAG_ERROR: Module 1: Runtime diag detected major event
警报建议卡故障,但是一些端口是UP。即使连结5020交换机联机,在Slot2的光纤通道(FC)模块脱机。输入show module命令为了查看模块的状况:
Mod Ports Module-Type Model Status
--- ----- -------------------------------- ---------------------- ------------
1 40 40x10GE/Supervisor N5K-C5020P-BF-SUP active *
2 8 8x1/2/4G FC Module N5K-M1008 offline <<<<<<
Mod Sw Hw World-Wide-Name(s) (WWN)
--- -------------- ------ --------------------------------------------------
1 4.2(1)N2(1) 1.3 --
2 4.2(1)N2(1) 1.0 77:9f:b7:62:2f:6c:69:62 to 00:00:00:b8:27:0a:08:2c
输入show environment命令为了查看模块环境数据。
Mod Model Power Power Power Power Status
Requested Requested Allocated Allocated
(Watts) (Amp) (Watts) (Amp)
--- ---------------------- ------- ---------- --------- ---------- ----------
1 N5K-C5020P-BF-SUP 625.20 52.10 625.20 52.10 powered-up
2 N5K-M1008 9.96 0.83 9.96 0.83 fail/shutdown
输入show logging nvram命令为了查看此输出:
N5020 %$ VDC-1 %$ %NOHMS-2-NOHMS_DIAG_ERROR: Module 1: Runtime diag detected major event:
Port failure: Ethernet1/1
N5020 %$ VDC-1 %$ last message repeated 2 times
N5020 %$ VDC-1 %$ %NOHMS-2-NOHMS_DIAG_ERROR: Module 1: Runtime diag detected major event:
Port failure: Ethernet1/2
N5020 %$ VDC-1 %$ last message repeated 7 times
N5020 %$ VDC-1 %$ %NOHMS-2-NOHMS_DIAG_ERROR: Module 1: Runtime diag detected major event:
Port failure: Ethernet1/5
N5020 %$ VDC-1 %$ last message repeated 3 times
N5020 %$ VDC-1 %$ %NOHMS-2-NOHMS_DIAG_ERROR: Module 1: Runtime diag detected major event:
Port failure: Ethernet1/13
正如你从日志看到,几个端口失败运行时诊断。并且,因为结构发生故障,从每Gatos ASIC的两个端口报告“硬件故障”。输入show interface摘要命令为了查看此输出:
--------------------------------------------------------------------------------
Ethernet VLAN Type Mode Status Reason Speed Port
Interface Ch #
--------------------------------------------------------------------------------
Eth1/1 1 eth fabric down Hardware failure 10G(D) 138
Eth1/2 1 eth fabric down Hardware failure 10G(D) 138
Eth1/3 1 eth fabric up none 10G(D) 138
Eth1/4 1 eth fabric up none 10G(D) 138
Eth1/5 1 eth fabric down Hardware failure 10G(D) 140
Eth1/6 1 eth fabric down Hardware failure 10G(D) 140
Eth1/7 1 eth fabric up none 10G(D) 140
Eth1/8 1 eth fabric up none 10G(D) 140
Gatos ASIC报告某些的失败端口并且禁用他们。输入error命令show hardware内部gatos的事件历史记录为了查看此输出:
1) Event:E_DEBUG, length:81, at 775734 usecs after Fri May 24 15:28:10 2013
[101] xcvr_set_port_to_hw_failure(): Sending nohms failure notif for port xgb1/13
2) Event:E_DEBUG, length:44, at 775726 usecs after Fri May 24 15:28:10 2013[100] CODE-PATH:
xcvr_set_port_to_hw_failure
935) Event:E_DEBUG, length:34, at 434695 usecs after Fri May 24 15:28:06 2013[100] CODE-PATH:
xcvr_port_disable
936) Event:E_DEBUG, length:38, at 434653 usecs after Fri May 24 15:28:06 2013[100] CODE-PATH:
xcvr_set_port_disable
937) Event:E_DEBUG, length:81, at 408233 usecs after Fri May 24 15:28:06 2013
[101] xcvr_set_port_to_hw_failure(): Sending nohms failure notif for port xgb1/30
938) Event:E_DEBUG, length:44, at 408224 usecs after Fri May 24 15:28:06 2013 [100] CODE-PATH:
xcvr_set_port_to_hw_failure
从女低音ASIC,有许多“错误中断”消息由于导致结构互联的同步问题(FI)重置。输入show hardware内部女低音错误发出命令为了查看此输出的事件历史记录:
1) Event:E_DEBUG, length:131, at 959201 usecs after Fri May 24 14:19:20 2013
[100] Threshold reached for error interrupt - ALT_FIC3_INT_3_XGXS_rx2_loss_of_sync, flags:
0xa8, fabric port: 15, Action: fi-reset
2) Event:E_DEBUG, length:122, at 372727 usecs after Fri May 24 14:15:05 2013
[100] Threshold reached for interrupt - ALT_FIC6_INT_0_XGXS_EXT_serdes_rx2_sync, masking it
(threshold=3 period=10 msecs)
453) Event:E_DEBUG, length:122, at 658189 usecs after Fri May 24 03:38:48 2013
[100] Threshold reached for interrupt - ALT_FIC6_INT_1_XGXS_EXT_serdes_rx0_sync, masking it
(threshold=3 period=10 msecs)
454) Event:E_DEBUG, length:129, at 658137 usecs after Fri May 24 03:38:48 2013
[100] Threshold reached for error interrupt - ALT_FIC6_INT_1_XGXS_rx2_code_eerror, flags:
0xa8, fabric port: 25, Action: fi-reset
问题归结于在女低音ASIC的硬件问题。请与Cisco技术支持中心(TAC)联系为了替换连结5000系列交换机。