Introduction
This document describes the solution to the problem that occurs when XFAB_FAP_FAILURE causes Card Failure in Aggregation Services Routers (ASR) 5500. The switch fabric provides backplane connectivity among all the cards in ASR 5500 chassis. Both control plane and data plane connectivity within the chassis is through the switch fabric. When the switch fabric experiences a fatal soft error that triggers a reset of a card and its re-initialization, it is reported with XFAB_FAP_FAILURE in different logs and outputs.
Problem
The XFAB_FAP_FAILURE messages are reported on DPC/UPDC/DPC2/UDPC2/MIO/UMIO cards and can be seen in the following outputs:
1. The show card diag <Card Number>: Here is a sample of output from an ASR 5500 chassis, the card experienced XFAB_FAP_FAILURE on FABRIC_1. The card passed diagnostics after it was rebooted and it is usable.
Boot Mode : Normal
Card Diagnostics : Pass
Current Failure : None
Last Failure : Failure: Device=FABRIC_1, Reason=XFAB_FAP_FAILURE, (0x03003156)
(last at <timestamp>)
Card Usable : Yes
2. With show logs and syslogs. Below is a sample of output from an ASR 5500 chassis, where the card N with serial number SADxxxxxxxxx experienced XFAB_FAP_FAILURE.
[csp 7019 critical] [5/0/1185 <cspctrl:0> spctrl_events.c:4514] [hardware internal system diagnostic] The Data Processing Card 2 with serial number SADxxxxxxxxx in slot <N> has failed and will be reset and brought back online. (Device=FABRIC_1, Reason=XFAB_FAP_FAILURE, Status=[BOARD:] [CPU0 MB: Boot Done HB_cpu: 0C:BA Error ID: None BPL: None] [CPU1 MB: Boot Done HB_cpu: 0D:BA Error ID: None BPL: None] [CPU2 MB: Boot Done HB_cpu: 0F:BC Error ID: None BPL: None])
Solution
1. Check the operational state of the card with show card table.
2. If the card status is Standby or Active, ensure that the card is usable and does not have current failures with 'show card diag <Card Number>'.
- Do not replace the card if it is a one-time failure.
- Monitor the card for repeated occurrences; replace the card if multiple occurrences are seen.
3. If the card status is offline, perform these steps sequencially:
- Issue 'card reboot <slot> -force'.
- If card reboot does not bring the card back online, re-seat the card - slide it out and slide it back into the chassis.
- If the card is back in active or standby state after one of the two initial attempts, monitor it for a few days.
- If the card is still offline after the first two procedures, replace the card with the ASR 5500 Card Replacements Method of Procedure (MOP)