Introduction
This document describes a problem encountered on Cisco Multilayer Data Switch (MDS) 9000 Series Fibre Channel (FC) ports and provides a solution to the problem.
Problem
This Link Events log displays:
*************** Port Config Link Events Log ***************
---- ------ ----- ----- ------
Time PortNo Speed Event Reason
---- ------ ----- ----- ------
...
Jul 28 00:46:39 2012 00670297 fc11/25 --- DOWN LR Rcvd B2B
The LR Rcvd B2B (or Link failure Link Reset failed nonempty recv queue) message indicates that the device attached to the port transmits a Link Reset (LR) to the MDS, but the MDS does not respond with a Link Reset Response (LRR) due to internal congestion on the port. The port has packets queued that are received from the attached device, but the MDS cannot deliver them to the appropriate egress port. Since they are still queued at the ingress port, the MDS cannot send back an LRR, and the link fails.
These error messages accompany the previous event log:
%PORT-2-IF_DOWN_LINK_FAILURE: %$VSAN 93%$
Interface fc11/25 is down (Link failure)
%PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 100%$
Interface fc5/32 is down (Link failure Link Reset
failed nonempty recv queue)
Note: This scenario is given under the assumptions that the number of buffer credits that the MDS grants the FC device is three, and that the FC device' packets are switched to the egress FC port.
MDS
FC Port FC Port
(Egress) Arbiter (Ingress) FC device
-------- ------- --------- ---------
1) <------- FC packet 1
2) <--- Grant Request
3) Grant------------>
4) <---------------FC packet 1
5) R_Rdy--------> Tx B2B=3
6) <------- FC packet 2 Tx B2B=2
7) <---- Grant Request
8) <------- FC packet 3 Tx B2B=1
9) <---- Grant Request
10) <------- FC packet 4 Tx B2B=0
11) <---- Grant Request
12) Time lapses - Variable depending on attached HBA type
13) <--------Link Reset(LR)
14) Start 90ms "LR Rcvd B2B" timer
15) "LR Rcvd B2B" timer expires
16) <--------NOS-------->
Explanation
This section explains the previous output:
- The FC device transmits in an FC packet to the ingress port, destined to the egress port.
- The MDS ingress Line Card (LC) port determines the Destination Index (DI), and transmits the Grant Request to the arbiter (Bellagio2) on the Active Supervisor.
- The arbiter sends back a Grant to the ingress port, which gives it permission to transmit FC packet 1 to the egress port through the XBAR.
- The ingress LC transmits FC packet 1 through XBAR to the egress port. This makes the ingress buffer available.
- The ingress port transmits an R_RDY back to the FC device, which replenishes credit.
Note: The first five steps are typical when there is no congestion. Assume at this point that the egress port queues are full and cannot receive any more packets.
- The FC device transmits FC packet 2 to the ingress port, destined to the egress port.
- The MDS ingress LC port determines the DI, and transmits the Grant Request to the arbiter (Bellagio2) on the Active Supervisor.
- The FC device transmits FC packet 3 to the ingress port, destined to the egress port.
- The MDS ingress LC port determines the DI, and transmits the Grant Request to the arbiter (Bellagio2) on the Active Supervisor.
- The FC device transmits FC packet 4 to the ingress port, destined to the egress port.
- The MDS ingress LC port determines the DI, and transmits the Grant Request to the arbiter (Bellagio2) on the Active Supervisor.
- Time lapses, which varies based on the attached HBA type.
- After some time at Tx B2B=0, the FC device initiates Credit Loss Recovery, and transmits a Link Reset (LR).
- When the ingress port receives the LR, it checks its ingress buffers and determines that there is at least one packet queued. It then starts a 90 ms LR Rcvd B2B timer.
- If the Grants are received, and the three FC packets are transmitted to the egress port, then the LR Rcvd B2B timer is canceled, and a Link Reset Response (LRR) is sent back to the FC device. In this case, however, the egress port remains congested, and the three FC packets remain queued at the ingress port. The LR Rcvd B2B timer expires, and an LRR is not transmitted back to the FC device.
- Both the ingress port and the FC device initiate a link failure via transmission of a Not Operational Sequence.
Solution
If the link failed with an LR Rcvd B2B or a Link failure Link Reset failed nonempty recv queue message, then the port that failed is not the cause of the slow-drain and was only affected by the slow/stuck port. In order to identify the slow/stuck port that caused the link failure, complete these steps:
- Determine if there is more than one link that fails due to the previously-mentioned issue. If more than one link fails at approximately the same time, then the problem might arise because all of the ports attempt to transmit packets to a common egress port.
- Check the VSAN zoning database in order to see with which devices the adjacent FC device is zoned. Map these to the egress E or local F ports. In order to map to egress E, ports use the show fspf internal route vsan <vsan> domain <dom> command. In order to map to local F ports, use the show flogi database vsan <vsan> command. If there is more than one link that fails with the LR Rcvd B2B message, then combine the egress E or local F ports found, and check for overlaps. Overlaps are likely causes of slow/stuck ports.
- Check the ports found in Step 2 for indications of slow-drain. Examples are:
- Credit Loss (AK_FCP_CNTR_CREDIT_LOSS / FCP_SW_CNTR_CREDIT_LOSS)
- 100 ms Tx B2B Zero (AK_FCP_CNTR_TX_WT_AVG_B2B_ZERO / FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO)
- Timeout Discards (AK_FCP_CNTR_LAF_TOTAL_TIMEOUT_FRAMES / THB_TMM_TOLB_TIMEOUT_DROP_CNT / F16_TMM_TOLB_TIMEOUT_DROP_CNT)
- If you determine that the slow port is an egress E port, then continue the slow-drain troubleshoot on the adjacent switch indicated by the FSPF next-hop interface.
- If you determine that the slow/stuck port is an FCIP link or port-channel, then check the FCIP links for signs of IP retransmissions or other problems, such as link failures. Enter the show ips stats all command in order to check for problems.
Configuration Options
Here are two possible system configuration options:
Related Information