Troubleshooting Some Line Card (LC) issues on NCS4016

Available Languages

Download Options

PDF (14.5 KB)
View with Adobe Reader on a variety of devices
ePub (75.9 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (73.9 KB)
View on Kindle device or Kindle app on multiple devices

Updated:November 24, 2015

Document ID:200272

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Troubleshooting Some Line Card (LC) issues on NCS4016

Introduction

Background Information

Troubleshooting Some Line Card (LC) issues on NCS4016

Introduction

This document describes how to troubleshoot Line card issues, faulty states under which line card gets stuck, possible reasons and recovery actions on a Cisco 4000 Series Network Convergence System (NCS4016).

Background Information

NCS4016 is a 16 LC(0-15 slots) Chassis and each LC capacity of 200G. Below are few basics sequence of events while LC is booted up on NCS4016 Chassis.

LC has been divided in to 9 power zones i.e. 0 to 8. All these power zones are controlled by CCC (Card controller Chip).
First zone to come up is Zone 0 which would bring up the CPU complex and boots up the basic logic for a LC.
Once the zone 0 powered ON. CCC executes power-on interpreter and configure the basic devices before bring the CPU out of RESET state. (If the CPU is power OFF it remains in RESET state).
Above are basic functions which are performed during the LC bootup. Have there been any issues in Zone 1 to 8 only slice corresponding to them would not get power ON. However if there is some issues in Zone 0 the whole LC would be power off.

Before You Begin:

Before you start the troubleshooting, it is suggested to keep a note of below commands.

Attach (or login) to sysadmin(Calvados) VM since the card which failed to boot would not be shown in XR VM the status and reason for failure can only be seen in sysadmin VM.
Only cards which have CPU on them would be expected to have Software state operational. Else state would be N/A (Not applicable) but their hardware should be “operational”

With all LC & RP operational you should be able to see output as below.

sysadmin-vm:0_RP0# show platform
Tue Aug 18 19:57:02.631 UTC
Location Card Type               HW State      SW State      Config State
----------------------------------------------------------------------------
0/0       NCS4K-2H-O-K            OPERATIONAL   N/A           NSHUT
0/5       NCS4K-24LR-O-S          OPERATIONAL   N/A           NSHUT
0/6       NCS4K-20T-O-S           OPERATIONAL   N/A           NSHUT
0/8       NCS4K-2H-O-K            OPERATIONAL   N/A           NSHUT
0/RP0     NCS4K-RP                OPERATIONAL   OPERATIONAL   NSHUT
0/FC1     NCS4016-FC-M            OPERATIONAL   N/A           NSHUT
0/CI0     NCS4K-CRAFT             OPERATIONAL   N/A           NSHUT
0/FT0     NCS4K-FTA               OPERATIONAL   N/A           NSHUT
0/FT1     NCS4K-FTA               OPERATIONAL   N/A           NSHUT
0/PT0     NCS4K-AC-PEM            OPERATIONAL   N/A           NSHUT
0/PT1     NCS4K-AC-PEM            OPERATIONAL   N/A           NSHUT
0/EC0     NCS4K-ECU               OPERATIONAL   N/A           NSHUT
sysadmin-vm:0_RP0#

Below are few common faulty HW & SW States in which LC could be stuck and their reasons.

State-1: HW_FAILED

This state suggests that card failed to boot due to some power issues or the CCC power-on interpreter prevented the completion of power up sequence.

Recommended actions:

Check the output of below command.

# sysadmin-vm:0_RP1# show platform detail location <location of card>

In above command look for “Last Event” and “Last Event Reason:” this will tell us the reason of failure.

sysadmin-vm:0_RP1# show platform detail location 0/fc1

Sat Jul 4 13:52:14.782 UTC

Platform Information for 0/FC1

PID : NCS4016-FC-M

Description : "NCS 4016 Agnostic Cross Connect - Multichassis "

VID/SN : V01

HW Oper State : OPERATIONAL

SW Oper State : N/A

Configuration : "NSHUT RST"

HW Version : 1.0

Last Event : HW_EVENT_FAILURE

Last Event Reason : "Intial discovery FAIL EXIT0 , power request on, but not finish ccc-pon startup power_control 0x00000001"

For the above failure state you could also check the status of CCC controller for particular location. You should be checking the status of power zone which is “SET”. Since different LC uses different power zone to boot up.

sysadmin-vm:0_RP0# show controller ccc power detail location 0/RP0

Tue Aug 18 18:33:30.245 UTC

Power detail : Zone information for 0/RP0:

---------------------------------------------------------

---------------------------------------------------------

| 0 | OK | SET | -- |

| 1 | OK | -- | -- |

| 2 | OK | SET | -- |

| 3 | OK | -- | -- |

| 4 | OK | SET | -- |

| 5 | -- | -- | -- |

| 6 | OK | -- | -- |

| 7 | -- | -- | -- |

| 8 | OK | SET | -- |

sysadmin-vm:0_RP0#

Recovery Actions:

Try to soft reset the LC by executing the below command.

# sysadmin-vm:0_RP1# hw-module location <location of card> reload

If soft reset doesn’t helps in resolving the issue a physical Online Insertion and Removal (OIR) of the card should be done.

State-2: POWERED_ON

This state is seen on the LC which is CPU less and all LC cards in NCS4k are CPU less.

Recommended actions:

sysadmin-vm:0_RP1# show platform

0/FC0 NC4K-FC OPERATIONAL N/A NSHUT

0/FC1 NC4K-FC POWERED_ON N/A NSHUT

0/FC2 NC4K-FC OPERATIONAL N/A NSHUT

In this case the fabric driver will try to recover the card on its own but if it cannot detects the ASIC in 3 minutes, failed then the card will land up in POWERED_ON state.

Check below output which shows all present cards in chassis are powered on successfully.

sysadmin-vm:0_RP0# show controller ccc power summary

Tue Aug 18 19:09:37.575 UTC

CCC Power Summary :

Location Card Type Power State

----------------------------------------------------------------

0/0 NCS4K-2H-O-K ON

0/FC1 NCS4016-FC-M ON

0/5 NCS4K-24LR-O-S ON

0/6 NCS4K-20T-O-S ON

0/RP0 NCS4K-RP ON

0/8 NCS4K-2H-O-K ON

sysadmin-vm:0_RP0#

Recovery Actions:

Try to soft reset the LC by executing the below command if state-2(POWERED_ON) continue exist for any LC/FC.

# sysadmin-vm:0_RP1# hw-module location <location of card> reload

If soft reset doesn’t helps in resolving the issue a physical OIR of the card should be done.

State-3: PRESENT

This means that card has been detected and is in power off state. This could be the valid state when the card has been configured to power OFF in configuration. Card might have been forced to shutdown due to environmental alarm, failure in CCC driver in detecting the card due to I2C failures.

Recommended actions:

sysadmin-vm:0_RP1# show platform detail location <location of card>

In above output please check “Last Event :” and “Last Event Reason :”.

To confirm the alarms you could also execute below command if the card has been shutdown due to any alarm conditions. Below output showing alarm condition for respective card location.

sysadmin-vm:0_RP0# show alarms

Tue Aug 18 18:03:35.421 UTC

-------------------------------------------------------------------------------

Active Alarms

-------------------------------------------------------------------------------

Location Severity Group Set time Description

-------------------------------------------------------------------------------

0/PT0-PM0 major environ 05/22/70 04:56:45 Power Module Error (PM_NO_INPUT_DETECTED).

0/PT0-PM0 major environ 05/22/70 04:56:45 Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI).

0/PT0-PM2 major environ 05/22/70 04:56:45 Power Module Error (PM_NO_INPUT_DETECTED).

0/PT0-PM2 major environ 05/22/70 04:56:45 Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI).

0/PT0-PM3 major environ 05/22/70 04:56:45 Power Module Error (PM_NO_INPUT_DETECTED).

0/PT0-PM3 major environ 05/22/70 04:56:45 Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI).

0/PT1-PM1 major environ 05/22/70 04:56:45 Power Module Error (PM_NO_INPUT_DETECTED).

You can also run the same command to check the output for respective location of the card.

sysadmin-vm:0_RP1# show alarms brief card location < location of card>

Recovery Actions:

Please try to soft reset the LC by executing the below command.

# sysadmin-vm:0_RP1# hw-module location <location of card> reload

If soft reset doesn’t helps in resolving the issue a physical OIR of the card should be done

State-4: UNKNOWN

The most common reason for this state is CCC driver failing to read the IDPROM from the card or CCC driver detected the IDPROM corruption that failed the card to be detected.

sysadmin-vm:0_RP1# show platform

Sat Jul 4 15:27:50.478 UTC

Location Card Type HW State SW State Config State

----------------------------------------------------------------------------

0/1 UNKNOWN POWERED_ON OPERATIONAL NSHUT

Recovery Actions:

Please try to soft reset the LC by executing the below command.

# sysadmin-vm:0_RP1# hw-module location <location of card> reload

If soft reset doesn’t helps in resolving the issue a physical OIR of the card should be done
If physical OIR doesn’t help then RMA of the card is suggested.

State-5: SW_INACTIVE

Please note for card to get in SW_INACTIVE state it has to be get operational in HW state. A common reason for card getting in to this state is HOST OS not able to access SSD.

Recommended actions:

Check if the card has control Ethernet connection.

sysadmin-vm:0_RP1# show controller switch reachable

Sat Jul 4 16:31:33.690 UTC

Rack Card Switch

--------------------

0 RP0 RP-SW

0 RP1 RP-SW

0 LC0 LC-SW

0 LC1 LC-SW

0 LC2 LC-SW

0 LC4 LC-SW

If the card doesn’t have the control Ethernet connection then execute below command to check Ethernet protocol state to the card. The state of the protocol should be either “Active” or “Standby” any other state seen would indicate the connection issue.

sysadmin-vm:0_RP0# show controller switch mlap location 0/RP0/RP-SW

Tue Aug 18 18:08:22.343 UTC

Rack Card Switch Rack Serial Number

--------------------------------------

0 RP0 RP-SW SAL19058RDF

Phys Admin Protocol Forward Protocol

Port State State State State Type Connects To

--------------------------------------------------------------------------

0 Down Up Down - Internal LC15

1 Down Up Down - Internal LC7

2 Down Up Down - Internal LC13

3 Down Up Down - Internal LC12

4 Down Up Down - Internal LC14

5 Down Up Down - Internal LC11

6 Up Up Active Forwarding Internal LC6

7 Up Up Active Forwarding Internal LC5

8 Down Up Down - Internal LC1

9 Down Up Down - Internal LC4

10 Down Up Down - Internal LC3

11 Down Up Down - Internal LC10

16 Up Up Active Forwarding Internal LC0

17 Up Up Active Forwarding Internal LC8

26 Down Up Down - Internal LC2

27 Down Up Down - Internal LC9

32 Down Up Down - Internal MATESC (RP0 Ctrl)

33 Down Up Down - Internal MATESC (RP1 Ctrl)

36 Up Up Active Forwarding Internal CCC (RP0 Ctrl)

37 Up Up Rem Managed Forwarding Internal CCC (RP1 Ctrl)

52 Down Up Down - External SFP+ 1

54 Down Up Down - External SFP+ 0

Recovery Actions:

If you have confirmed that port is down then you can also try to access the card CPU console and check if card is responsive or not. Upon access card will throw messages suggesting why it went to SW_INACTIVE state.

sysadmin-vm:0_RP1# attach location <location of card>

Last hop of resort should be re-imaging the card.

#reimage_chassis –s <slot id> but prior to this step consult with technical expert.

Contributed by Cisco Engineers

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Network Convergence System 4000 Series

Troubleshooting Some Line Card (LC) issues on NCS4016

Available Languages

Download Options

Bias-Free Language

Contents

Troubleshooting Some Line Card (LC) issues on NCS4016

Introduction

Background Information

Before You Begin:

State-1: HW_FAILED

State-2: POWERED_ON

State-3: PRESENT

State-4: UNKNOWN

State-5: SW_INACTIVE

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products