Internal CRC Detection and Isolation
Beginning with the Cisco MDS NX-OS Release 6.2(13), the Internal Cyclic Redundancy Check (CRC) detection and isolation functionality is supported on the Cisco MDS 9700 Series switches.
This functionality enables the Cisco MDS switches to detect CRC errors that occur internally within a switch and isolate the source of these errors.
Note |
Internal CRC Detection and Isolation is supported only on the Cisco MDS 9700 Series Multilayer Directors. |
By default, the internal CRC detection and isolation is disabled.
The modules that support this functionality are:
- Cisco MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module
- Cisco MDS 9700 48-Port 10-Gbps Fibre Channel over Ethernet Switching Module
- Cisco MDS 9700 Fabric Module 1
- Cisco MDS 9700 Supervisor Module 3
Note |
Module refers either a switching module or a supervisor module. |
These errors are a separate class of CRC errors when compared to frames that arrive from outside the switch, with CRC errors. In store mode and forward mode, frames with CRC errors are dropped at the ingress port and do not propagate through the system. Internal CRC errors occur when frames are received without errors, but get corrupted when they pass through the switching path.
Internal CRC errors are usually caused by a fault in the system. Such faults may be transient, such as an ungracefully removed module, or permanent, such as a badly seated module, or, in rare cases, a failing or failed hardware component. The rate of errors depends on many factors and may range from very high to very low.
The error-rate threshold is configurable as a system-wide value, but separate error counts are maintained for each module to identify an error source.
Note |
The counters are reset at 24 hours from the time the feature, the Internal Cyclic Redundancy Check (CRC) detection and isolation was first configured. |
Stages of Internal CRC Detection and Isolation
The five possible stages at which internal CRC errors may occur in a switch:
Stage 1 —Ingress buffer of a module
Stage 2 —Ingress crossbar of a module
Stage 3 —Crossbar of a fabric module
Stage 4 —Egress crossbar of a module
Stage 5 —Egress buffer of a module
Errors on each module are handled individually when the error count exceeds the threshold.
Note |
A total of errors on all applicable ASICs on the module must exceed the threshold. |
When errors cross the specified threshold, XBAR_MONITOR_INTERNAL_CRC_ERR is the syslog message that is logged. This syslog message specifies the location of the error and the type of action taken.
Example: Error Messages
switch# show logging logfile | inc MONITOR_INTERNAL_CRC_ERR
2015 May 25 21:20:41 switch %XBAR-2-XBAR_MONITOR_INTERNAL_CRC_ERR: Module-1 detects CRC Error:4 at Egress Q-engine, putting it in failure state
2015 May 25 21:15:35 switch %XBAR-2-XBAR_MONITOR_INTERNAL_CRC_ERR: Fab_slot-12 detects CRC error:1 at ingress stage2, putting it in failure state
2015 May 25 15:47:10 switch %XBAR-2-XBAR_MONITOR_INTERNAL_CRC_ERR: Module-5 detects CRC error:2 at Ingress Qengine, Only one Sup is present, bringing down the active VSAN
2015 May 25 15:08:17 switch %XBAR-2-XBAR_MONITOR_INTERNAL_CRC_ERR: Module-5 detects CRC error:1 at Ingress Qengine, putting it in failure state
Stage 1—Ingress Buffer of a Module
There are multiple ingress buffers on each module. When the CRC error rate of an ingress buffer on a switching module reaches the threshold, the entire module is shut down. See Actions Taken on a Supervisor when the Threshold Exceeded for more information.
Stage 2—Ingress Crossbar of a Module
Ingress crossbar is an ASIC complex on an ingress module that switches traffic from ingress buffers to fabric modules. When the CRC error rate of an ingress switching module crossbar reaches the threshold, the entire module is shut down. See Actions Taken on a Supervisor when the Threshold Exceeded for more information.
Stage 3—Crossbar of a Fabric Module
Crossbar is an ASIC complex on a fabric module that switches traffic from an ingress module to an egress module.
When the CRC error rate of a crossbar reaches the threshold, if there is more than one fabric module in the corresponding switch, the host fabric module is shut down. If the switch has only one fabric module, the module connected to the fabric module link on which the errors occurred is shut down.
Stage 4—Egress Crossbar of a Module
Egress crossbar is an ASIC complex on an egress module that switches traffic from fabric modules to egress buffers. When the CRC error rate of an egress switching module crossbar reaches the threshold, the entire module is shut down. See Actions Taken on a Supervisor when the Threshold Exceeded for more information.
Stage 5—Egress Buffer of a Module
There are multiple egress buffers on each module. When the CRC error rate of an egress buffer on a switching module reaches the threshold, the entire module is shut down. See Actions Taken on a Supervisor when the Threshold Exceeded for more information.
Actions Taken on a Supervisor when the Threshold Exceeded
The actions taken on a supervisor when the threshold is exceeded during the following stages of internal CRC detection and isolation:
Stage 1 —Ingress Buffer of a Module
Stage 2 — Ingress Crossbar of a Module
Stage 3 —Egress Crossbar of a Module
Stage 5 —Egress Buffer of a Module
Note |
|