Cisco Nexus 7000 Series NX-OS System Management Configuration Guide

Finding Feature Information

Your software release might not support all the features documented in this module. For the latest caveats and feature information, see the Bug Search Tool at https://tools.cisco.com/bugsearch/ and the release notes for your software release. To find information about the features documented in this module, and to see a list of the releases in which each feature is supported, see the "New and Changed Information"chapter or the Feature History table in this chapter.

Information About Online Diagnostics

Online diagnostics help you verify that hardware and internal data paths are operating as designed so that you can rapidly isolate faults.

Online Diagnostics Overview

With online diagnostics, you can test and verify the hardware functionality of the device while the device is connected to a live network.

The online diagnostics contain tests that check different hardware components and verify the data path and control signals. Disruptive online diagnostic tests (such as the disruptive loopback test) and nondisruptive online diagnostic tests (such as the ASIC register check) run during bootup, line module online insertion and removal (OIR), and system reset. The nondisruptive online diagnostic tests run as part of the background health monitoring and you can run these tests on demand.

Online diagnostics are categorized as bootup, runtime or health-monitoring diagnostics, and on-demand diagnostics. Bootup diagnostics run during bootup, health-monitoring tests run in the background, and on-demand diagnostics run once or at user-designated intervals when the device is connected to a live network.

Bootup Diagnostics

Bootup diagnostics run during bootup and detect faulty hardware before Cisco NX-OS brings a module online. For example, if you insert a faulty module in the device, bootup diagnostics test the module and take it offline before the device uses the module to forward traffic.

Bootup diagnostics also check the connectivity between the supervisor and module hardware and the data and control paths for all the ASICs.

Bootup diagnostics log failures to onboard failure logging (OBFL) and syslog and trigger a diagnostic LED indication (on, off, pass, or fail).

You can configure the device to either bypass the bootup diagnostics or to run the complete set of bootup diagnostics.

Note

Bootup tests are not available on demand.

The following tables describe the bootup diagnostic tests for a module and a supervisor:

Table 1. Bootup Diagnostic Tests for Modules
Test Name	Description	Supported Modules	Unsupported Modules
EOBCPortLoopback	Disruptive test, not an on-demand test. Ethernet out of band	All F1, M1, M3, F2, F2e and F2 modules	—
OBFL	Verifies the integrity of the onboard failure logging (OBFL) flash.	All F1, M1, M3, F2, F2e and F2 modules	—
FIPS	Disruptive test; run only when FIPS is enabled on the system. An internal test that runs during module bootup to validate the security device on the module.	N7K-M148GS-11 N7K-M148GS-11L N7K-M108X2-12L N7K-M132XP-12 N7K-M132XP-12L All M2 Modules	N7K-M148GT-11 N7K-M148GT-11L All F1 Modules All F2 Modules N7K-F248XT-25E All F3 Modules All M3 Modules
BootupPortLoopback	Disruptive test, not an on-demand test. A PortLoopback test that runs only during module bootup.	N7K-M148GS-11 N7K-M148GS-11L N7K-M108X2-12L N7K-M132XP-12 N7K-M132XP-12L All M2 Modules All F1 Modules All F2 Modules All F2e Modules N77-M348XP-23L N77-M324FQ-25L	N7K-M148GT-11 N7K-M148GT-11L All F3 Modules

Table 2. Bootup Diagnostic Tests for Supervisors
Test Name	Description	Supported Modules	Unsupported Modules
USB	Nondisruptive test. Checks the USB controller initialization on a module.	Sup1, Sup2, and Sup2E	—
CryptoDevice	Nondisruptive test. Checks the Cisco Trusted Security (CTS) device initialization on a module.	Sup1	Sup2 and Sup2E
ManagementPortLoopback	Disruptive test, not an on-demand test. Tests loop back on the management port of a module.	Sup1, Sup2, and Sup2E	—
EOBCPortLoopback	Disruptive test, not an on-demand test. Ethernet out of band.	Sup1, Sup2, and Sup2E	—
OBFL	Verifies the integrity of the onboard failure logging (OBFL) flash.	Sup1, Sup2, and Sup2E	—

Runtime or Health Monitoring Diagnostics

Runtime diagnostics are also called health monitoring (HM) diagnostics. These diagnostic tests provide information about the health of a live device. They detect runtime hardware errors, memory errors, the degradation of hardware modules over time, software faults, and resource exhaustion.

Runtime diagnostics are nondisruptive and run in the background to ensure the health of a device that is processing live network traffic. You can enable or disable runtime tests. You can change the runtime interval for a runtime test.

Note

Recommended best practice: Do not change the runtime interval from the default value.

The following tables describe the runtime diagnostic tests for a module and a supervisor.

Table 3. Runtime Diagnostic Tests for Modules
Test Name	Description	Default Interval	Supported Modules	Unsupported Modules
ASICRegisterCheck	Checks read/write access to scratch registers for the ASICs on a module.	1 min	All modules	—
PrimaryBootROM	Verifies the integrity of the primary boot device on a module.	30 min	All modules	—
SecondaryBootROM	Verifies the integrity of the secondary boot device on a module.	30 min	All modules	—
PortLoopback	Checks diagnostics at a per-port basis on all Admin Down ports.	15 min	N7K-M148GS-11 RF N7K-M148GS-11L N7K-M108X2-12L N7K-M132XP-12 RF N7K-M132XP-12L N77-F348XP-23 All M2, F1, F2, F3, and F2e modules N77-M348XP-23L N77-M324FQ-25L	N7K-M148GT-11 N7K-M148GT-11L
RewriteEngineLoopback	This is a nondisruptive per-port loopback test, and hence can run on ports that are up as well. It is designed to monitor the fabric to LC connectivity and can detect supervisor and fabric failures.	1 min	All M1, M2, F2, and F2e modules N77-M348XP-23L N77-M324FQ-25L	All F1 and F3 modules
SnakeLoopback	Performs a nondisruptive loopback on all ports, even those ports that are not in the shut state. The ports are formed into a snake during module boot up, and the supervisor checks the snake connectivity periodically.	20 min	All F1, F2, and F2e modules	All M1, M2, M3, and F3 modules
InternalPortLoopback	Nondisruptive per-port loopback test, and hence can run on ports that are up as well.	5 min	All M2, F2, and F2e modules N77-M348XP-23L N77-M324FQ-25L	All M1, F1, and F3 modules

Table 4. Runtime Diagnostic Tests for Supervisors
Test Name	Description	Default Interval	Supported Supervisors	Unsupported Supervisors
ASICRegisterCheck	Checks read/write access to scratch registers for the ASICs on a module.	20 sec	Sup1, Sup2, and Sup2E	—
NVRam	Verifies the sanity of the NVRAM blocks on a supervisor.	5 min	Sup1, Sup2, and Sup2E	—
RealTimeClock	Verifies that the real-time clock on the supervisor is ticking.	5 min	Sup1, Sup2, and Sup2E	—
PrimaryBootROM	Verifies the integrity of the primary boot device on a module.	30 min	Sup1, Sup2, and Sup2E	—
SecondaryBootROM	Verifies the integrity of the secondary boot device on a module.	30 min	Sup1, Sup2, and Sup2E	—
CompactFlash	Verifies access to the internal compact flash devices.	30 min	Sup1, Sup2, and Sup2E	—
ExternalCompactFlash	Verifies access to the external compact flash devices.	30 min	Sup1, Sup2, and Sup2E	—
PwrMgmtBus	Verifies the standby power management control bus.	30 sec	Sup1, Sup2, and Sup2E	—
SpineControlBus	Verifies the availability of the standby spine module control bus.	30 sec	Sup1 and Sup2	Sup2E
SystemMgmtBus	Verifies the availability of the standby system management bus.	30 sec	Sup1, Sup2, and Sup2E	—
StatusBus	Verifies the status transmitted by the status bus for the supervisor, modules, and fabric cards.	30 sec	Sup1, Sup2, and Sup2E	—
StandbyFabricLoopback	Verifies the connectivity of the standby supervisor to the crossbars on the spine card.	30 sec	Sup1, Sup2, and Sup2E	—
PCIeBus	Verifies PCIe connectivity from the supervisor to the crossbar ASICs on the fabric cards.	30 sec	Sup2 and Sup2E	—

Recovery Actions for Specified Health-Monitoring Diagnostics

Before Cisco NX-OS Release 6.2(8), runtime tests did not take corrective recovery actions when they detected a hardware failure. The default action through EEM included generating alerts (callhome, syslog) and logging (OBFL, exception logs). These actions are informative, but they did not remove faulty devices from the network, which can lead to network disruption, traffic black holing, and so forth. Before Cisco NX-OS Release 6.2(8), you must manually shut the devices to recover the network.

In Cisco NX-OS Release 6.2(8) and later releases, you can configure the system to take disruptive action if the system detects failure on one of the following runtime, or health-monitoring, tests:

PortLoopback test
RewriteEngineLoopback test
SnakeLoopback test
StandbyFabricLoopback test

The recovery actions feature is disabled by default. With this feature you can configure the system to take disruptive action as a result of repeated failures on the health-monitoring, or runtime, tests. This feature enables or disables the corrective, conservative action on all four tests, simultaneously; the corrective action taken differs for each test. After crossing the maximum consecutive failure count for that test, the system takes corrective action.

With the recovery actions feature enabled, he corrective action for each test is as follows:

PortLoopback test—The system moves the port registering faults to an error-disabled state.
RewriteEngineLookpback test—The system takes different corrective action depending on whether the fault is with the supervisor, the fabric, or the port, as follows:
- On a chassis with a standby supervisor, when the system detects a fault with the supervisor, the system switches over to the standby supervisor. If there is no standby supervisor in the chassis, the system does not take any action.
- After failures on the fabric, the system will reload the fabric 3 times. If failure persists, the system powers down the fabric.
- After the failures on a port, the system moves the faulty port to the error-disabled state.

SnakeLoopback test—After the test detects 10 consecutive failures with any port on the module, the system will move the faulty port to an error-disabled state.
StandbyFabricLoopback test—The system attempts to reload the standby supervisor if it receives error on this test and continues to reload if the system keeps seeing the failure even after the reload. It cannot power off the standby supervisor.

Finally, the system maintains a history of the recovery actions that includes details of each action, the testing type, and the severity. You can display these counters.

On-Demand Diagnostics

On-demand tests help localize faults and are usually needed in one of the following situations:

To respond to an event that has occurred, such as isolating a fault.
In anticipation of an event that may occur, such as a resource exceeding its utilization limit.

You can run all the health monitoring tests on demand. You can schedule on-demand diagnostics to run immediately.

You can also modify the default interval for a health monitoring test.

High Availability

A key part of high availability is detecting hardware failures and taking corrective action while the device runs in a live network. Online diagnostics in high availability detect hardware failures and provide feedback to high availability software components to make switchover decisions.

Cisco NX-OS supports stateless restarts for online diagnostics. After a reboot or supervisor switchover, Cisco NX-OS applies the running configuration.

Virtualization Support

Cisco NX-OS supports online diagnostics in the default virtual device context (VDC) or, beginning with Cisco NX-OS Release 6.1, in the admin VDC. By default, Cisco NX-OS places you in the default VDC.

Online diagnostics are virtual routing and forwarding (VRF) aware. You can configure online diagnostics to use a particular VRF to reach the online diagnostics SMTP server.

Guidelines and Limitations for Online Diagnostics

Online diagnostics has the following configuration guidelines and limitations:

You cannot run disruptive online diagnostic tests on demand.
The F1 Series modules support the following tests: ASICRegisterCheck, PrimaryBootROM, SecondaryBootROM, EOBCPortLoopback, PortLoopback, and BootupPortLoopback.
Support for the RewriteEngineLoopback and SnakeLoopback tests on F1 Series modules is deprecated in Cisco NX-OS Release 5.2.
Beginning with Cisco NX-OS Release 6.1, F2 Series modules support the RewriteEngineLoopback and SnakeLoopback tests.
Beginning with Cisco NX-OS Release 7.3(0)DX(1), M3 Series modules support generic online diagnostics.

The following generic online diagnostics supported on M3 series:

Table 5. Generic Online Diagnostics Supported on M3 Series
ASICRegisterCheck	Health Monitoring/On Demand
PrimaryBootROM	Health Monitoring/On Demand
SecondaryBootROM	Health Monitoring/On Demand
EOBCPortLoopback	Bootup test only
OBFL	Bootup test only
PortLoopback	Health Monitoring/On Demand when port admin down only
RewriteEngineLoopback	Health Monitoring/On Demand
IntPortLoopback	Health Monitoring/On Demand
BootupPortLoopback	Bootup test only

Default Settings for Online Diagnostics

The following table lists the default settings for online diagnostic parameters.


Parameters	Default
Bootup diagnostics level	complete
Nondisruptive tests	active

Configuring Online Diagnostics

Note

Be aware that the Cisco NX-OS commands for this feature may differ from those commands used in Cisco IOS.

Setting the Bootup Diagnostic Level

You can configure the bootup diagnostics to run the complete set of tests or you can bypass all bootup diagnostic tests for a faster module bootup time.

Note

We recommend that you set the bootup online diagnostics level to complete. We do not recommend bypassing the bootup online diagnostics.

Before you begin

Make sure that you are in the correct VDC. To change the VDC, use the switchto vdc command.

Procedure

	Command or Action	Purpose
Step 1	switch# configure terminal	Enters global configuration mode.
Step 2	switch(config)# diagnostic bootup level {complete \| bypass}	Configures the bootup diagnostic level to trigger diagnostics as follows when the device boots: complete—Perform all bootup diagnostics. The default is complete. bypass—Do not perform any bootup diagnostics.
Step 3	(Optional) switch(config)# show diagnostic bootup level	(Optional) Displays the bootup diagnostic level (bypass or complete) that is currently in place on the device.
Step 4	(Optional) switch(config)# copy running-config startup-config	(Optional) Copies the running configuration to the startup configuration.

Activating a Diagnostic Test

You can set a diagnostic test as active and optionally modify the interval (in hours, minutes, and seconds) at which the test runs.

Note

Recommended best practice: Do not change the runtime interval from the default value.

Before you begin

Make sure that you are in the correct VDC. To change the VDC, use the switchto vdc command.

Procedure

	Command or Action	Purpose
Step 1	switch# configure terminal	Enters global configuration mode.
Step 2	(Optional) switch(config)# diagnostic monitor interval module `slot` test [`test-id` \| `name` \| all] hour `hour` min `minute` second `second`	(Optional) Configures the interval at which the specified test is run. If no interval is set, the test runs at the interval set previously, or the default interval. The argument ranges are as follows: `slot` —The range is from 1 to 10. `test-id` —The range is from 1 to 14. `name` —Can be any case-sensitive, alphanumeric string up to 32 characters. `hour` —The range is from 0 to 23 hours. `minute` —The range is from 0 to 59 minutes. `second` —The range is from 0 to 59 seconds.
Step 3	switch(config)# [no] diagnostic monitor module `slot` test [`test-id` \| `name` \| all]	Activates the specified test. The argument ranges are as follows: `slot` —The range is from 1 to 10. `test-id` —The range is from 1 to 14. `name` —Can be any case-sensitive, alphanumeric string up to 32 characters. The [no] form of this command inactivates the specified test. Inactive tests keep their current configuration but do not run at the scheduled interval.
Step 4	(Optional) switch(config)# show diagnostic content module {`slot` \| all}	(Optional) Displays information about the diagnostics and their attributes.

Setting a Diagnostic Test as Inactive

You can set a diagnostic test as inactive. Inactive tests keep their current configuration but do not run at the scheduled interval.

Use the following command in global configuration mode to set a diagnostic test as inactive:


Command	Purpose
no diagnostic monitor module slot test [test-id\|name \| all]	Inactivates the specified test. The following ranges are valid for the each keyword: slot —The range is from 1 to 10. test-id —The range is from 1 to 14. name —Can be any case-sensitive alphanumeric string up to 32 characters

Configuring Corrective Action

You can configure the device to take corrective action when it detects failures on any of the following runtime diagnostic tests:

PortLoopback
RewriteEngineLoopback
SnakeLoopback
StandbyFabricLoopback

Note

This feature enables or disables the corrective, conservative action on all four tests, simultaneously; the corrective action taken differs for each test.

Before you begin

Make sure that you are in the correct VDC. To change the VDC, use the switchto vdc command.

Procedure

Command or Action

Purpose

Step 1

switch# configure terminal

Enters global configuration mode.

Step 2

switch(config)# [no] diagnostic eem action conservative

Enables or disables corrective actions when the system detects failures on port loopback, rewrite engine loopback, snake loopback, internal port loopback and standby fabric loopback tests.

Note

Use the no form of the command to disable these corrective actions.

Step 3

switch# event gold [failure-type {sup | fabric | lc | port}] module {module | all} test {test-name | test-id} [severity {major | minor | moderate}] testing-type {bootup | monitoring | ondemand | scheduled} consecutive-failure count

Triggers an event if the named online diagnostic test experiences the configured failure severity for the configured number of consecutive failures.

The module specifies the number of the module that needs to be monitored.

The test-name is the name of a configured online diagnostic test. The test-id specifies the test ID of the event criteria. The range is from 1 to 30.

The count range is from 1 to 1000.

Note

This CLI command can be used to modify the consecutive failure count for GOLD system default policies.

Starting or Stopping an On-Demand Diagnostic Test

You can start or stop an on-demand diagnostic test. You can optionally modify the number of iterations to repeat this test, and the action to take if the test fails.

We recommend that you only manually start a disruptive diagnostic test during a scheduled network maintenance time.

Before you begin

Make sure that you are in the correct VDC. To change the VDC, use the switchto vdc command.

Procedure

	Command or Action	Purpose
Step 1	(Optional) switch# diagnostic ondemand iteration `number`	(Optional) Configures the number of times that the on-demand test runs. The range is from 1 to 999. The default is 1.
Step 2	(Optional) switch# diagnostic ondemand action-on-failure {continue failure-count `num-fails` \| stop}	(Optional) Configures the action to take if the on-demand test fails. The `num-fails` range is from 1 to 999. The default is 1.
Step 3	switch# diagnostic start module `slot` test [`test-id` \| `name` \| all \| non-disruptive] [port `port-number` \| all]	Starts one or more diagnostic tests on a module. The module slot range is from 1 to 10. The `test-id` range is from 1 to 14. The test name can be any case-sensitive, alphanumeric string up to 32 characters. The port range is from 1 to 48.
Step 4	switch# diagnostic stop module `slot` test [`test-id` \| `name` \| all]	Stops one or more diagnostic tests on a module. The module slot range is from 1 to 10. The `test-id` range is from 1 to 14. The test name can be any case-sensitive, alphanumeric string up to 32 characters.
Step 5	(Optional) switch# show diagnostic status module `slot`	(Optional) Verifies that the diagnostic has been scheduled.

Clearing Diagnostic Results

You can clear diagnostic test results.

Use the following command in any mode to clear the diagnostic test results:


Command	Purpose
diagnostic clear result module [`slot` \| all] test {`test-id` \| all}	Clears the test result for the specified test. The valid ranges are as follows: `slot` —The range is from 1 to 10. `test-id` —The range is from 1 to 14.

Simulating Diagnostic Results

You can simulate diagnostic test results.

Use the following command in any mode to simulate a diagnostic test result or clear the simulated test results:


Command	Purpose
diagnostic test simulation module `slot` test `test-id` {fail \| random-fail \| success} [port `number` \| all]	Simulates the test result for the specified test. The valid ranges are as follows: `slot` —The range is from 1 to 10. `test-id` —The range is from 1 to 14. port `number` —The range is from 1 to 48.
diagnostic test simulation module `slot` test `test-id` clear	Clears the simulated results for the specified test. The valid ranges are as follows: `slot` —The range is from 1 to 10. `test-id` —The range is from 1 to 14.

Verifying the Online Diagnostics Configuration

To display online diagnostics configuration information, perform one of the following tasks:


Command	Purpose
show diagnostic bootup level	Displays information about bootup diagnostics.
show diagnostic content module {`slot` \| all}	Displays information about diagnostic test content for a module.
show diagnostic description module `slot` test [`test-name` \| all]	Displays the diagnostic description.
show diagnostic eem [action [description] \| policy module {`module number` \| all}]	Displays the Embedded Event Manager (EEM) action level and the EEM policies configured for the level.
show diagnostic events [error \| info]	Displays diagnostic events by error and information event type.
show diagnostic ondemand setting	Displays information about on-demand diagnostics.
show diagnostic result module `slot` [test [`test-name` \| all]] [detail]	Displays information about the results of a diagnostic.
show diagnostic simulation module `slot`	Displays information about a simulated diagnostic.
show diagnostic status module `slot`	Displays the test status for all tests on a module.
show event manager events action-log event-type [gold \| gold_sup_failure \| gold_fabric_failure \| gold_module_failure \| gold_port_failure]	Displays the recovery action history for the specified failure, including the number of switchovers, reloads, and poweroffs, as well as timestamp, failure reason, module-id, port list, test name, testing type, and severity. This data is maintained across ungraceful reloads.
show hardware capacity [eobc \| forwarding \| interface \| module \| power]	Displays information about the hardware capabilities and current hardware utilization by the system.
show module	Displays module information including the online diagnostic test status.

Configuration Examples for Online Diagnostics

This example shows how to start all on-demand tests on module 6:

diagnostic start module 6 test all

This example shows how to activate test 2 and set the test interval on module 6:

configure terminal
diagnostic monitor module 6 test 2
diagnostic monitor interval module 6 test 2 hour 3 min 30 sec 0

Additional References

For additional information related to implementing online diagnostics, see the following sections:

Topics	Document Title
Online diagnostics CLI commands	Cisco Nexus 7000 Series NX-OS System Management Command Reference
VDCs and VRFs	Cisco Nexus 7000 Series NX-OS Virtual Device Context Configuration Guide

Feature History Table for Online Diagnostics

The following table lists the release history for this feature.


Feature Name	Releases	Feature Information
Online diagnostics (GOLD)	7.3(0)DX(1)	Added support for M3 Series modules for the following diagnostic tests: ASICRegisterCheck, PrimaryBootROM, SecondaryBootROM, EOBCPortLoopback, OBFL, PortLoopback, RewriteEngineLoopback, IntPortLoopback, and IntPortLoopback.
Online diagnostics (GOLD)	7.2(0)D1(1)	This feature was introduced.
Online diagnostics (GOLD)	6.2(10)	Added support for the N77-F348XP-23 module for the PortLoopback test. Added support for all M2, F2, and F2e modules for the InternalPortLoopback test.
Recovery actions on specified health-monitoring diagnostics.	6.2(8)	Enables you to configure recovery actions for the following runtime diagnostic tests: PortLoopback, RewriteEngineLoopback, SnakeLoopback test , and StandbyFabricLoopback.
Online diagnostics (GOLD)	6.2(6)	Added support to all F3 modules except for N77-F348XP-23.
Online diagnostics (GOLD)	6.1(1)	Added support for Supervisor 2 and M2 Series modules. Added support for F2 Series modules for the RewriteEngineLoopback and SnakeLoopback tests. Added support for configuring online diagnostics in the admin VDC.
Online diagnostics (GOLD)	5.2(1)	Enabled the SpineControlBus test on the standby supervisor. Deprecated the SnakeLoopback test on F1 Series modules.
Online diagnostics (GOLD)	5.1(2)	Added support for the SnakeLoopback test on F1 Series modules.
Online diagnostics (GOLD)	5.1(1)	Added support for the FIPS and BootupPortLoopback tests.
Online diagnostics (GOLD)	4.2(1)	Added support for the PortLoopback, StatusBus, and StandbyFabricLoopback tests.
Online diagnostics (GOLD)	4.0(1)	This feature was introduced.

Bias-Free Language

Results

Chapter: Configuring Online Diagnostics

Configuring Online Diagnostics

Finding Feature Information

Information About Online Diagnostics

Online Diagnostics Overview

Bootup Diagnostics

Runtime or Health Monitoring Diagnostics

Recovery Actions for Specified Health-Monitoring Diagnostics

On-Demand Diagnostics

High Availability

Virtualization Support

Guidelines and Limitations for Online Diagnostics

Default Settings for Online Diagnostics

Configuring Online Diagnostics

Setting the Bootup Diagnostic Level

Before you begin

Procedure

Activating a Diagnostic Test

Before you begin

Procedure

Setting a Diagnostic Test as Inactive

Configuring Corrective Action

Before you begin

Procedure

Starting or Stopping an On-Demand Diagnostic Test

Before you begin

Procedure

Clearing Diagnostic Results

Simulating Diagnostic Results

Verifying the Online Diagnostics Configuration

Configuration Examples for Online Diagnostics

Additional References

Related Documents

Feature History Table for Online Diagnostics

Was this Document Helpful?

Contact Cisco