- Index
- Preface
- Troubleshooting Overview
- Troubleshooting FCoE Issues
- Troubleshooting Layer 2 Switching Issues
- Troubleshooting QoS Issues
- Troubleshooting SAN Switching Issues
- Troubleshooting Security Issues
- Troubleshooting System Management Issues
- Troubleshooting Virtual Port Channel Issues
- Troubleshooting Config-Sync Issues
- Standard pause frames
- PFC not negotiated with FCOE-capable adapters (CNA)
- Switch Interface connected to CNA receives constant pause frames (PFC)
- Check if switch is sending pause frames or getting paused
- Switch ports err-disabled due to pause rate-limit
- How to enable link pause (flow control) on switch that connects DCBX capable devices
- How to clear PFC counters
Troubleshooting FCoE Issues
Fibre Channel over Ethernet (FCoE) provides a method of transporting Fibre Channel traffic over a physical Ethernet connection. FCoE requires that the underlying Ethernet be full duplex and provides lossless behavior for Fibre Channel traffic.
This chapter describes how to identify and resolve problems that can occur with FCoE in the Cisco Nexus 5000 Series switch.
Data Center Bridging
VFC (FCoE) interface not online
General troubleshooting
An FCoE-attached server has no connectivity to FC, or FCoE-attached storage, and the show interface command for the virtual Fibre Channel interface mapped to this server's port reveals that the VFC interface is down.
Note The default setting for VFC is shutdown, however, in the following example was changed by the setup script.
- Check to ensure that the LLDP Transmit and Receive are enabled on the interface.
Use the show lldp interface ethernet 1/4 command.
If LLDP is disabled, the VFC will not come online.
You can enable LLDP transmit and receive with the interface ethernet 1/4 command:
- Check that the peer supports LLDP.
Check if remote peers exist. Check if values exist for a peer's LLDP TLVs.
Use the show lldp interface ethernet 1/4 command.
- Check the peer (CNA) to see if it supports DCBX.
Use the show system internal dcbx info interface ethernet 1/4 command.
(For releases earlier than 4.2(1)N1, use the “sh platform software dcbx internal info interface ethernet x/y” command.)
Note In the example, DCBX is enabled and the peer supports CEE.
- In the output from the show system internal dcbx info interface ethernet 1/4 command, check the peers LLDP values.
Make sure that the mandatory LLDP values exist.
- In the output from the show system internal dcbx info interface ethernet 1/4 command, check the peers DCBX TLVs.
Make sure that PFC and FCoE TLV were negotiated as willing and enabled, and that there are no errors.
Use the show system internal dcbx info interface ethernet 1/4 command.
(For releases earlier than 4.2(1)N1, use the sh platform software dcbx internal info interface ethernet x/y command.)
- Check the DCBX counters located at the very bottom of the output display from the show system internal dcbx info interface ethernet 1/4 command. Look for any errors.
- Check for the same values for the FCoE Data Center Bridging and the Type-Length-Value on the host CNA software.
- Ensure that the VSAN trunk protocol has been enabled.
Use the configuration terminal command to enter into configuration mode and use the trunk protocol enable command to enable the trunking protocol.
Use the show system internal dcbx info interface ethernet 1/4 command.
(For releases earlier than 4.2(1)N1, use the sh platform software dcbx internal info interface ethernet x/y command.)
– Indicates negotiation error.
– Never expected to happen when connected to CNA.
– When two Nexus 5000 switches are connected back-to-back, and if PFC is enabled on different CoS values, then a negotiation error can occur.
– Indicates negotiation result.
– Absence of operating configuration indicates that the peer does not support the DCBX TLV or that there is a negotiation error.
– The remote_feature_tlv_present message indicates whether the remote peer supports this feature TLV or not.
– Peer does not support the LLDP Protocol.
– Peer does not support the DCBX Protocol.
– Peer does not support some DCBX TLVs.
– Unexpected DCBX negotiation result.
- An option exists to force PFC mode on an interface.
Use the inerfacet ethernet 1/21 command and the priority-flow-control mode command to force the PFC mode.
Note The default setting for this command is auto. The no option returns the mode to auto.
Nexus 5548 Troubleshooting
The type of Converged Network Adapter might not be supported.
Ensure that the type of adapter is supported. The FCoE interface only supports a Generation-2 Converged Network Adapter.
The FCoE class-fcoe system class is not enabled in the QoS configuration.
For a Cisco Nexus 5548 switch, the FCoE class-fcoe system class is not enabled by default in the QoS configuration. Before enabling FCoE, you must include class-fcoe in each of the following policy types:
The following is an example of a service policy that needs to be configured:
FIP
Note FIP Generation-1 CNAs are not supported on the Nexus 2232 FEX. Only FIP Generation-2 CNAs are supported on the Nexus 2232 FEX.
VFC down due to FIP failure
Host is not capable of supporting FIP-related TLVs.
When the connected host does not support FIP, the first step of VLAN-discovery fails based on which VFC is brought up. Use show commands to verify that the three basic TLVs required for FIP are exchanged by DCBX over the bound interface, and that FCOE-MGR is enabled for FIP. The three TLVs are FCoE TLV, PriGrp TLV, and PFC TLV. These three TLVs should be checked for both local and peer values.
Verify the TLVs with the following commands:
- show system internal dcbx info interface <bound-ethernet-interface-id>
- show platform software fcoe_mgr info interface vfc <id>
In the output from the commands:
The state of the VFC never progresses further to solicitation.
Make sure you check for correct FIP supporting firmware and drivers on the CNA and FIP supporting adapters.
VFC down due to FIP solicitation failure
When the FIP solicitation fails, the VFC goes down.
Once the first step of FIP VLAN-discovery has succeeded, the host sends FIP solicitations. The switch should respond with FIP advertisements in detail. If the response is not sent or the advertisement is not sent back to the solicitation received, the VFC does not come up. The host continues trying to solicit, but never succeeds.
The following are possible reasons for no response or advertisement:
- No active fabric-provided MAC address exists. (Possible wrong fc-map, etc.)
- Fabric is not available for FLOGI.
- MAC address descriptor may be incorrect. (This is the address the CNA uses as the DMAC when it sends responses.)
Use the show platform software fcoe_mgr info interface vfc <id> command to view the status of the FIP solicitation.
In the output from the command, check for triggered event: [FCOE_MGR_VFC_EV_FIP_VLAN_DISCOVERY];
followed by triggered event: [FCOE_MGR_VFC_EV_FIP_SOLICITATION].
If the solicitation is successful, then triggered event: [FCOE_MGR_VFC_EV_FIP_FLOGI] is displayed.
If the solicitation has failed, then triggered event: [FCOE_MGR_VFC_EV_FIP_FLOGI] is not displayed and no further progress occurs.
Need to check and ensure that the VSAN is active, the memberships are correct, and that the fabric is available. Also while in NPV mode, check that an active border/NP port is available.
VFC down because VLAN response not received by CNA
Though the switch sends out a VLAN response, the response is not received by the CNA. This indicates that the VFC is down.
A bound interface native VLAN ID should be a non-FCoE VLAN. If not, and the native VLAN matches the FCoE VLAN, the VLAN response sent out will be untagged. However, the FIP adapters expect tagged frames. This means that the native VLAN on the trunk interface should be a non-FCoE VLAN.
Check the configuration on the bound Ethernet trunk interface and ensure that it is a non-FCoE native VLAN.
VFC down because no active STP port-state on the bound Ethernet interface
No active STP port-state on the bound Ethernet interface causes the VFC to be down.
The bound interface should be in a STP-forwarding state for both the native VLAN and the member FCOE VLAN mapped to the active VSAN. If there are no STP active ports on the VLAN, then the switch drops all FIP packets received on the VLAN over the bound interface. This means that the FIP is not initiated to bring up the VFC.
Check the STP port state on the bound Ethernet trunk interface for both non-FCOE native VLAN and FCOE member VLAN. Fix the STP port state and move it to forwarding, if in blocked inconsistent state or error-disable state.
VFC down due to FIP keepalive misses
The VFC goes down due to FIP keepalive misses.
When FIP keepalives (FKA) are missed for a period of approximately 22 seconds, this means that approximately three FKAs are not continuously received from the host. Missed FKAs can occur for many reasons, including congestion or link issues.
FKA timeout : 2.5 * FKA_adv_period.
The FKA_adv_period is exchanged and agreed upon with the host as in the FIP advertisement when responding to a solicitation.
Observe the output from the following commands to confirm FKA misses:
- show platform software fcoe_mgr info interface vfc <id>
- show platform software fcoe_mgr event-history errors
- show platform software fcoe_mgr event-history lock
- show platform software fcoe_mgr event-history msgs
- show platform fwm info pif ethernet <bound-ethernet-interface-id>
Sometimes when congestion is relieved, the VFC comes back up. If the symptom persists, then additional analysis is required. The possible considerations are:
CNA
This section includes an overview of best practices for the topology of the Converged Network Adapter (CNA), a description of troubleshooting with host-based tools, followed by a description of common problems and their solutions.
Best practice topology for CNA
Best Practice Topology for Direct Connected CNA
- A unique dedicated VLAN must be configured at every converged access switch to carry traffic for each virtual fabric (VSAN) in the SAN (for example, VLAN 1002 for VSAN 1, VLAN 1003 for VSAN 2, and so on). If MSTP is enabled, a separate MST instance must be used for FCoE VLANs
- Unified Fabric (UF) links must be configured as trunk ports. FCoE VLAN must not be configured as a native VLAN. All FCoE VLANs must be configured as members of the UF links. This allows it to be extendible for VF_Port trunking and VSAN management for the VFC interfaces.
- UF links must be configured as spanning tree edge ports.
- FCoE VLANs must not be configured as members of Ethernet links that are not designated to carry FCoE traffic. This ensures to limit the scope of the spanning-tree protocol for FCoE VLANs to UF links only.
- If the converged access switches (in the same SAN fabric or in the other) need to be connected to each over Ethernet links for the purposes of LAN alternate pathing, then such links must explicitly be configured to exclude all FCoE VLANs from membership. This ensures to limit the scope of the Spanning Tree Protocol for FCoE VLANs to UF links only.
- Separate FCoE VLANs must be used for FCoE in SAN-A and SAN-B.
Best Practice Topology for Remote Connected CNAs
- A unique dedicated VLAN must be configured at every converged access switch and every blade switch to carry traffic for each virtual fabric (VSAN) in the SAN (for example, VLAN 1002 for VSAN 1, VLAN 1002 for VSAN 2, and so on). If MSTP is enabled, a separate MST instance must be used for FCoE VLANs.
- Unified Fabric (UF) links must be configured as trunk ports. FCoE VLAN must not be configured as a native VLAN. All FCoE VLANs must be configured as members of the UF links. This allows it to be extendible for VF_Port trunking and VSAN management for the VFCs.
- UF links between the CNAs and the blade switches must be configured as spanning tree edge ports.
- A blade switch must connect to exactly one converged access switch, preferably over an Ethernet port channel to avoid disruption due to STP reconvergence on events such as provisioning of new links or blade switches.
Troubleshooting with Host tools
You can troubleshoot the CNA with following host-based tools:
– Emulex provides the OneCommand GUI tool to manage Emulex CNAs. The CEE tab of this tool displays details about DCB configurations and FIP settings within the FC interface.
– Qlogic provides the SanSurfer tool. The Data Center Bridging tab of this tool displays the DCB configuration learned from the switch alone with TLV exchange data. The DCE Statistics tab of this tool displays the ethernet statistics.
– Microsoft Windows provides tools to view the configuration and registers for many CNA vendor products.
CNA not recognized by Host OS
Although the CNA is installed on the host, the Converged Network Adapter (CNA) is not recognized.
The host operating system may not have the appropriate drivers to support the installed Converged Network Adapter model.
Step 1 1) Obtain the following information:
a. Operating system of the host.
b. Specific model of installed CNA.
Step 2 Reference the appropriate vendor support page for the CNA model and host OS.
Step 3 Determine if an existing driver is already installed on the host OS.
Step 4 Ensure that the latest driver is installed from the CNA vendor support page or the host OS support page.
PFC
This section includes an overview of how to view standard pause frames, followed by a description of common problems and their solutions.
Standard pause frames
For ports with standard, non-CNA type host connections, the Nexus 5000 supports standard pause frames. These are enabled with the interface setting, as shown in the following example:
To view standard pause frames, use the show interface flowcontrol command.
PFC not negotiated with FCOE-capable adapters (CNA)
Priority flow control (PFC) is not negotiated with FCOE-capable adapters (CNA).
This causes packet drop to be noticed on FCoE traffic from the servers.
The CNA may not support DCBX and the PFC TLV is not negotiated.
Use the following information to verify DCBX support and that the PFC TLV is negotiated:
- Check the status of the PFC. Use the show int ethx/x priority-flow-control command.
(Connected to CNA.)
- Check for LLDP neighbor or PFC/DCBX TLV advertised by the peer. Use the show system internal dcbx info int ethx/x command.
Switch Interface connected to CNA receives constant pause frames (PFC)
Constant pause frames (PFC) are received when the switch interface is connected to a CNA.
If the Nexus 5000 switch is connected to a CNA, then the CNA might be sending Xon PFC frames to the switch. This increments pause counters when using the show interface ethx/x command.
To verify this situation, perform the following:
- For a few iterations, check using the show interface ethx/x command and make sure the pause frame count is incrementing using the show interface ethx/x |grep - i pause command.
- For a few iterations, check using the show interface ethx/x command and ensure that the PFC frame count is incrementing using the show interface ethx/x priority-flow-control command.
- For a few iterations, use the show queuing interface ethx/x command to check the pause status.
If the Rx (Inactive) and pause counter increment over time (as shown with the show interface ethx/x priority-flow-control command), then this indicates that the issue is due to Xon frames received from the CNA.
If the Nexus 5000 switch is connected to a CNA along with slow servers that are not able to handle the traffic from the switch port, then the server sends Xoff pause frames to the switch to slow it down. This increments the pause counters when using the show interface ethx/x command.
To verify this situation, perform the following:
- For a few iterations, check using the show interface ethx/x |grep - i pause command and ensure that the pause frame count is incrementing.
- For a few iterations, check using the show interface ethx/x priority-flow-control command and ensure that the PFC frame count is incrementing.
- For a few iterations, use the show queuing interface ethx/x command and check the pause status.
If the Rx (Active) and pause counter increment (as shown with the show interface ethx/x priority-flow-control command), this indicates that the issue is due to Xoff frames received from the server.
Xoff pause frames from the server pause the Nexus 5000 interface and reduces the throughput from the switch to the CNA. On the server, investigate the OS/PCI slot to ensure that they are high-speed servers. Replace the servers that can run 10gb throughput.
Check if switch is sending pause frames or getting paused
FCoE throughput on servers is very low due to pause frames from the switch. It is then necessary to check if the switch is sending pause frames or if it is getting paused.
If the egress FC port is congested, the switch sends PFC frames to the servers. The PFC frames are sent to reduce its FCoE rate and avoid a drop. If the server is slow or congested, the server sends PFC frames to the switch interface.
To verify this situation, perform the following:
- For a few iterations, check using the show interface ethx/x |grep - i pause command and ensure that the pause frame count (Rx/TX) is incrementing.
- For a few iterations, check using the show interface ethx/x priority-flow-control command and ensure that the PFC frame count (RX/TX) is incrementing.
- For a few iterations, check using the show queuing interface ethx/x command to check the pause status.
Note PFC frames are a MAC-level type of packet and cannot be viewed using the SPAN feature. Analyzer in-line is required to actually see the PFC frames on the wire.
If the Rx (Active) and pause RX counter increment (as shown with the show interface ethx/x priority-flow-control command), then this indicates that this issue is due to Xoff frames received from the server.
If the Tx(Active) and pause TX counter increment (as shown with the show interface ethx/x priority-flow-control command), this indicates that this issue is due to Xoff frames transmitted by the switch.
Identify the source of the congestion and try to resolve it by increasing the FC bandwidth or change it to a more powerful server. If congestion is expected, then Pause is expected for FCoE traffic.
Switch ports err-disabled due to pause rate-limit
Switch ports go into error-disable state due to pause rate limit.
If the switch interface receives excessive Xoff pause frames from the server, ports become error-disabled due to the high rate of pause frames received. Usually the port goes into an err-disable state due to pause frames, only if the drain rate is less than 5Mbps on a 10Gb port. This means that the server is very slow and is sending a large number of pause frames to the switch ports.
To verify this situation, use the show interface ethernet slot/port brief command.
The following example displays the show interface ethernet slot/port brief command output when an interface goes down due to pause rate-limit err-disabled:
switch# show interface ethernet1/27 brief
%NOHMS-2-NOHMS_DIAG_ERROR: Module 1: Runtime diag detected pause rate limit event: Port failure:
%ETHPORT-5-IF_DOWN_ERROR_DISABLED: Interface Ethernet1/27 is down (Error disabled. Reason:error)
Ethernet interfaces have two methods of flow control:
Link level pause are used on interfaces where PFC is not enabled. Link level pause is not configured on interfaces using class-based QoS or FCoE. By default, link-level pauses are enabled on some interfaces and disabled on some interfaces. Where as PFC pause are typically used on FCoE interfaces and class-based QoS interfaces. Excessive pause frames of either type can cause an interface to become error disabled.
To view if a link-level pause is enabled on an interface, use the show interface ethx/x command.
To view if a PFC pause is enabled on an interface, use the show interface <ethx/y> priority-flow-control command.
- Check if the RX pause count is a large value. RxPPP is the PFC Pause frames received on the interface and TxPPP are the PFC Pause frames sent on that interface.
- Check for pause error-disable logs using the show hardware internal gatos event-history errors |grep -i err command.
Pause error-disable recovery can be enabled to get the ports out of this state, if the port is error-disabled due to transient condition as follows:
If the port is error-disabled due to transient condition listed below, then pause error-disable recovery can be enabled to move the ports out of this state.
If there is a consistent port error-disable condition due to the pause rate limit, determine if the issue is that the server is too slow. Replace the slow server.
How to enable link pause (flow control) on switch that connects DCBX capable devices
Link pause is not enabled on the switch ports that are connected to servers. It is necessary to enable link pause (flow control) on a Nexus 5000 switch that connects DCBX-capable devices.
If the peer supports PFC TLV with DCBX, then configuring the flowcontrol send on and the flowcontrol receive on does not enable link pause. You have to disable PFC TLV sent by DCBX on the interface.
To verify this situation, perform one of the following:
- Check if the operating state is off using the show interface ethx/y flowcontrol command.
- Check if the operating state is on using the show interface ethx/y priority-flow-control command.
Use the following commands under the interface ethx/y command to enable link pause instead of PFC with DCBX capable devices:
How to clear PFC counters
How to clear priority flow counters.
Use the clear qos statistics command to clear the PFC counters.
Another workaround is to clear interface counters and then enter the show interface ethx/x flowcontrol command to see the PFC frame count.
Note The PFC frame count is incremented using the show int ethx/x flowcontrol command. This is a known bug.
Registers and Counters
Interface level errors
To view any interface level errors, use the show interface counters errors command.
Packet byte counts
To view packet byte counts, use the show interface counters detailed command.
Verification of SNMP readouts
To view the verification of SNMP readouts, use the sh interface ethernet 1/11 counters snmp command.
Traffic rates
To view traffic rates, use the show interface ethernet 1/11 counters brief command.