此产品的文档集力求使用非歧视性语言。在本文档集中,非歧视性语言是指不隐含针对年龄、残障、性别、种族身份、族群身份、性取向、社会经济地位和交叉性的歧视的语言。由于产品软件的用户界面中使用的硬编码语言、基于 RFP 文档使用的语言或引用的第三方产品使用的语言,文档中可能无法确保完全使用非歧视性语言。 深入了解思科如何使用包容性语言。
思科采用人工翻译与机器翻译相结合的方式将此文档翻译成不同语言,希望全球的用户都能通过各自的语言得到支持性的内容。 请注意:即使是最好的机器翻译,其准确度也不及专业翻译人员的水平。 Cisco Systems, Inc. 对于翻译的准确性不承担任何责任,并建议您总是参考英文原始文档(已提供链接)。
本文档 d说明 如何排除Nexus 9000交换机上的第1层链路抖动问题。
思科建议您先熟悉Cisco Nexus操作系统(NX-OS)和基本Nexus架构,然后再继续本文档中介绍的信息。
本文档中的信息基于以下软件和硬件版本:
本文档中的信息都是基于特定实验室环境中的设备编写的。本文档中使用的所有设备最初均采用原始(默认)配置。如果您的网络处于活动状态,请确保您了解所有命令的潜在影响。
链路抖动是交换机上的物理接口(例如Nexus 9000)在打开和关闭之间不断交替的网络问题。这种破坏行为可能会降低网络性能、破坏网络稳定并中断通信,从而造成极大的不便。链路抖动通常由物理层故障或协议同步问题引起。
当协议同步存在问题时,就会发生协议触发的链路抖动。这可能涉及链路汇聚控制协议(LACP)、虚拟端口通道等协议。问题可能源于协议配置错误或数据包丢失,从而导致链路不稳定。定期监控和及时的软件更新有助于防止此类链路抖动。
链路抖动也可能来自网络物理层第1层。这通常涉及电缆和接口等物理组件。电缆损坏、松动或老化,以及接口故障都可能导致链路抖动。定期物理检查和维护(包括电缆检查和接口测试)有助于在这些问题导致链路抖动之前识别并纠正这些问题。
本文重点介绍第1层物理问题的故障排除。
从日志可以轻松识别链路抖动。此示例显示端口E1/5上的链路抖动事件,在该事件中,端口关闭,然后稍后重新打开。
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel100: first operational port changed from Ethernet1/5 to none
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel100: Ethernet1/5 is down
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 100000 Kbit
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/5 is down (Link failure)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-SPEED: Interface Ethernet1/5, operational speed changed to 10 Gbps
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_DUPLEX: Interface Ethernet1/5, operational duplex mode changed to Full
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface Ethernet1/5, operational Receive Flow Control state changed to off
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface Ethernet1/5, operational Transmit Flow Control state changed to off
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-SPEED: Interface port-channel100, operational speed changed to 10 Gbps
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_DUPLEX: Interface port-channel100, operational duplex mode changed to Full
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel100, operational Receive Flow Control state changed to off
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel100, operational Transmit Flow Control state changed to off
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_UP: port-channel100: Ethernet1/5 is up
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel100: first operational port changed from none to Ethernet1/5
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 10000000 Kbit
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETHPORT-5-IF_UP: Interface Ethernet1/5 is up in mode access
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETHPORT-5-IF_UP: Interface port-channel100 is up in mode access
以太网端口管理器(Ethpm)是管理以太网接口的过程。Ethpm事件历史记录可用于识别链路抖动的原因。
E1/5在05:28:35遇到链路故障,由ETH_PORT_FSM_EV_LINK_DOWN触发ethpm转换。这表示第1层抖动。
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 100000 Kbit
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/5 is down (Link failure)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
N9K-C93180YC-FX# show system internal ethpm event-history interface e1/5
[143] 2024-01-21T05:26:02.100255000+00:00 [-] FSM:<Ethernet1/5> Transition:
Previous state: [ETH_PORT_FSM_ST_WAIT_BUNDLE_MEMBER_BRINGUP]
Triggered event: [ETH_PORT_FSM_EV_FIRST_BRINGUP_BUNDLE_MEMBER_DONE]
Next state: [ETH_PORT_FSM_ST_BUNDLE_MEMBER_UP]
[144] 2024-01-21T05:27:35.783495000+00:00 [-] FSM:<Ethernet1/5> Transition:
Previous state: [ETH_PORT_FSM_ST_BUNDLE_MEMBER_UP]
Triggered event: [ETH_PORT_FSM_EV_LINK_DOWN]
Next state: [FSM_ST_NO_CHANGE]
E1/8在07:40:07进入初始化关闭状态,由ETH_PORT_FSM_EV_EXTERNAL_REINIT_NO_FLAP_REQ触发ethpm转换。这表示链路汇聚控制协议(LACP)触发的链路抖动。
2024 Jan 21 07:37:20 N9K-C93180YC-FX %ETHPORT-5-IF_UP: Interface port-channel200 is up in Layer3
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel200 is down (No operational members)
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel200: first operational port changed from Ethernet1/8 to none
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel200: Ethernet1/8 is down
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel200,bandwidth changed to 100000 Kbit
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/8 is down (Initializing)
N9K-C93180YC-FX# show system internal ethpm event-history interface e1/8
[218] 2024-01-21T07:37:20.551880000+00:00 [-] FSM:<Ethernet1/8> Transition:
Previous state: [ETH_PORT_FSM_ST_WAIT_BUNDLE_MEMBER_BRINGUP]
Triggered event: [ETH_PORT_FSM_EV_FIRST_BRINGUP_BUNDLE_MEMBER_DONE]
Next state: [ETH_PORT_FSM_ST_BUNDLE_MEMBER_UP]
[219] 2024-01-21T07:40:07.104339000+00:00 [-] FSM:<Ethernet1/8> Transition:
Previous state: [ETH_PORT_FSM_ST_BUNDLE_MEMBER_UP]
Triggered event: [ETH_PORT_FSM_EV_EXTERNAL_REINIT_NO_FLAP_REQ]
Next state: [FSM_ST_NO_CHANGE]
思科提供广泛的光纤模块阵列,可适应各种速度、介质和距离。在将链路连接到Nexus 9000之前,请确保SFP和电缆与当前软件和硬件兼容。您可以通过以下方式验证这一点:
从NX-OS 10.2.1开始,所有Cloudscale ToR和EoR平台都支持平台见解引擎(PIE)。PIE是交换机上的实时根本原因分析应用程序。
三个PIE可以帮助您解决第1层链路抖动问题。
链路抖动PIE分析用户空间驱动程序(USD)发布的链路抖动事件并确定链路抖动的根本原因。PIE将根本原因分析见解发布给代理。链路抖动事件由USD(PIE客户端)在链路抖动时发布。USD从ASIC和USD收集进行根本原因分析所需的所有相关数据,并将数据发布给代理。链路抖动PIE会分析数据并得出抖动最可能的根本原因。
PIE中的链路查找链路未接通的根本原因。当接口配置为up但接口运行状态不是up时,USD会收集有关接口的数据。此数据将发布到PIE应用程序。链路中断PIE会订用这些事件,接收来自Broker的数据,并分析数据以查找根本原因。
光学PIE是一个连续监测引擎,它对定期收集的DOM数据执行时间序列分析。通过在一段时间内跟踪DOM中的各种参数,PIE会到达一个度量来描述每个光纤端口的光学状态。该度量是对光纤收发器趋势健康状况的洞察。
有关详细信息,请参阅此PIE文档:
Cisco Nexus 9000系列NX-OS平台见解引擎指南,版本10.2(x)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel100: first operational port changed from Ethernet1/5 to none
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel100: Ethernet1/5 is down
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 100000 Kbit
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/5 is down (Link failure)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-SPEED: Interface Ethernet1/5, operational speed changed to 10 Gbps
<snip>
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_UP: port-channel100: Ethernet1/5 is up
N9K-C93180YC-FX# show pie interface ethernet 1/5 link-flap-rca
2024-01-21 05:27:35 Event Id: 00000068 Ethernet1/5 Source Id: 436209664 RCA Code: 41 >>>PIE event time
Reason: Link flapped/down due to Local Fault, check peer >>>PIE link flap reason
N9K-C93180YC-FX# show pie interface ethernet 1/5 transceiver-insights
2024-01-21 05:30:12 Event Id: 00000080 Event Class: xcvr DOM DB Event Interface: Ethernet1/5 Health Metric: --------GOOD------- Mod: 01
2024-01-21 05:28:12 Event Id: 00000072 Event Class: xcvr DOM DB Event Interface: Ethernet1/5 Health Metric: --------GOOD------- Mod: 01
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel100: first operational port changed from Ethernet1/5 to none
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel100: Ethernet1/5 is down
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 100000 Kbit
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/5 is down (Link failure)
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
N9K-C93180YC-FX# show pie interface ethernet 1/5 link-down-rca
2024-01-21 05:48:48 Event Id: 00000197 Ethernet1/5 Source Id: 436209664 RCA Code: 16 >>>PIE event time
Reason: No PCS alignment detected. Please check Fec, speed, Autoneg configurations with peer >>>Physical layer failed
N9K-C93180YC-FX# show pie interface ethernet 1/5 transceiver-insights
2024-01-21 05:50:12 Event Id: 00000199 Event Class: xcvr DOM DB Event Interface: Ethernet1/5 Health Metric: ********BAD******** Mod: 01
2024-01-21 05:48:12 Event Id: 00000187 Event Class: xcvr DOM DB Event Interface: Ethernet1/5 Health Metric: --------GOOD------- Mod: 01
根据PIE输出,建议更换潜在的故障组件并继续监控。如果链路抖动仍然存在,则需要交换测试缩小故障部分的范围。通过一次更改一个组件,同时保持其他组件不变,可以执行交换测试。最终,在交换出特定故障组件后,链路将稳定下来。
对于10.2(1)之前的NX-OS软件版本,PIE支持不可用。检查第1层链路抖动需要几个手动步骤。
这将列出所连接模块上的所有链路事件。反退回时间是指接口在通知管理引擎链路断开之前等待的持续时间。在此期间,接口会等待查看链路是否恢复正常。这用于确定链路是否已断开或只是出现轻微抖动。
N9K-C93180YC-FX# attach module 1
module-1# show system internal port-client link-event
*************** Port Client Link Events Log ***************
---- ------ ----- ----- ------
Time PortNo Speed Event Stsinfo
---- ------ ----- ----- ------
Jan 21 05:48:38 2024 00122142 Ethernet1/5 ---- DOWN Link down debounce timer stopped and link is down
Jan 21 05:48:37 2024 00993003 Ethernet1/5 ---- DOWN Link down debounce timer started(0x40e50006)
Jan 21 05:45:14 2024 00432606 Ethernet1/5 10G UP SUCCESS(0x0)
这些事件提供关于每个链路事件的详细信息。
N9K-C93180YC-FX# attach module 1
module-1# show hardware internal tah link-events fp-port 5
324) Jan 21 05:48:37 2024 uSec 992843: Fp 5 : tahusd_isr.c #8469
Port Down with an ASIC interrupt
------------- ASIC MAC/PCS/Serdes REGS (Mac Channel 0) -------------
Link flapped due to Local Fault, check peer >>>Local Fault means the local device detected the issue on the receive path.
>>>Remote Fault means a Local Fault is detected across the link.
Intr Regs 00:0x0000, 01:0x0000, 02:0x0000, 03:0x0010, 07:0x0000, 11:0x0000, 15:0x0000
sts2.bercount : 0x0f00 sts2.erroredblocks : 0x0000
bercounthi : 0x0000 erroredblockhi : 0x0000
counters0.syncloss : 0x0001 counters0.blocklockloss: 0x0001
counters1.highber : 0x0000 counters1.vlderr : 0x0000
counters2.unkerr : 0x0012 counters2.invlderr : 0x0000
错误代码 |
说明 |
|
sts2.erroredblocks |
计算误码块(高位数)。 |
|
sts2.bercount |
计算错误的同步报头(低位位)。 |
|
贝孔蒂语 |
对错误的同步报头进行计数(高位数)。 |
|
错误的分区 |
计算误码块(高位数)。 |
|
counters0.syncloss |
同步丢失 |
|
counters0.blocklockloss |
阻止锁定丢失 |
|
counters1.highber |
高BER |
|
counters1.vlderr |
有效错误 |
|
counters2.unkerr |
未知错误 |
|
counters2.invlderr |
无效错误 |
此输出中包含几条小型封装热插拨(SFP)信息。如果有任何值在SFP诊断的可接受范围之外,则SFP被视为可能损坏的组件,需要更换。在本例中,一切正常。
N9K-C93180YC-FX# show interface e1/5 transceiver details
Ethernet1/5
transceiver is present
type is 10Gbase-SR >>>SFP type
name is CISCO-OPLINK >>>SFP vendor
part number is TPP4XGDS0CCISE2G
revision is 02
serial number is OPMXXXXXXXX >>>SFP SN
nominal bitrate is 10300 MBit/sec >>>SFP bitrate
Link length supported for 50/125um OM2 fiber is 82 m
Link length supported for 62.5/125um fiber is 26 m
Link length supported for 50/125um OM3 fiber is 300 m
cisco id is 3
cisco extended id number is 4
cisco part number is 10-2415-03
cisco product id is SFP-10G-SR >>>SFP PID
cisco version id is V03
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 36.52 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
Current 6.61 mA 12.00 mA 0.50 mA 11.50 mA 1.00 mA
Tx Power -2.70 dBm 1.99 dBm -11.30 dBm -1.00 dBm -7.30 dBm
Rx Power -2.40 dBm 1.99 dBm -13.97 dBm -1.00 dBm -9.91 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
peer side information is snipped.
如果前几次检查后一切正常,则需要进行交换测试来缩小故障部分的范围。通过一次更改一个组件,同时保持其他组件不变,可以执行交换测试。最终,在交换出特定故障组件后,链路将稳定下来。
版本 | 发布日期 | 备注 |
---|---|---|
1.0 |
31-Jan-2024 |
初始版本 |