Introduction
This document describes how to troubleshoot Capture Resource Center (CRC) errors of ASR5500 Delivery Collaboration Portal (DCP) and MIO.
Background Information
Upon detection of CRC errors, ASR5500 is designed in order to perform self-healing and auto-recovery. In most cases, when you perform a non-intrusive soft reset of internal processes and auto card switchover, it recovers the system from such packet corruption.
Problem
When a soft error (CRC Error) is detected, the StarOS first attempts to proactively recover the fault by soft resetting the relevant internal processes such as npumgr and DDF reload. If such recovery is not successful, then the card is automatically restarted to clear the soft error and perform the full hardware check-up of the card.
Upon detection of CRC errors by DPC/UPDC/DPC2/UDPC2/MIO/UMIO, one of the first recovery steps that the system performs is to soft reset the processes associated with the affected chipset. In this example, the logs from show logs'/syslog and debug console of card 8 detected CRC error and were able to recover.
2021-Aug-01+01:01:01.711 [drvctrl 39204 error]
[8/0/7058 <hwmgr:80> hw_common_lib.c:492]
[software internal system syslog] hw_mon_elem_changed:
Detected DDF RELOAD on CRC error: card 8, device DDF1
2021-Aug-01+01:01:01.727 card 8-cpu0: [23552535.124999]
DF2 Complex-0 Program DDF2 CAF_DF1_PROG_ERR error detected on FLM123456AB
In certain situations, if the process restart does not recover the system, DPC/UPDC/DPC2/UDPC2/MIO/UMIO cards are automatically restarted. In this example, the logs from show logs, system syslog and/or debug console logs, the affected card is automatically restarted by the system upon detection of CRC errors. In these logs, card 6 was restarted and it came back in the standby state.
2021-Jun-20+10:11:12.150 [hat 3033 error]
[5/0/7094 <hatsystem:0> atsystem_fail.c:1470]
[hardware internal system critical-info diagnostic]
Card error detected on card 6 device DDF reason DDF_CRC_ERROR
2021-Jun-20+10:11:12.201 [rct 13013 info]
[software internal system critical-info syslog] Card 6 shutdown started
2021-Jun-20+10:11:12.201 [afctrl 186001 error]
[5/0/7169 <afctrl:0> l_msg_handler.c:277]
[software internal system critical-info syslog]
afctrl_bcf_scrmem_doorbell_callback: Slot 6 scratch memory driver error
******** show rct stats *******
RCT stats Details (Last 1 Actions)
Action Type From To Start Time Duration
----------------- --------- ---- ---- ------------------------ ----------
Shutdown N/A 6 0 2021-Jun-20+10:11:12.201 0.002 sec
Solution
Most of the time CRC errors detected on the DPC and MIO card are transient errors that are auto recovered by the system. If the card successfully restarts and come back to service, no further actions are required. In case if the system is not able to auto recover from these errors, the system makes the impacted data processing card offline after 3 reset attempts. If the card is restarted and comes back in the standby state then no further actions are needed. In rare situations, if the system is not able to auto recover from the CRC, contact Cisco TAC.