Diagnostic Tools


You can use diagnostics tools to diagnose hardware problems with your Cisco servers. The user interface displays the status of the test run and examines log files for troubleshooting hardware issues.

This chapter contains the following sections:

Diagnostic Tools Functions

Using Diagnostic Tools

Diagnostic Tools Functions

Diagnostic tools allows you to:

Run tests on various server components to find out hardware issues along with analysis of the test results in a tabular format.

Run all the tests using the Quick Tasks functionality without browsing through available tests.

Run tests serially, as running some tests in parallel may interfere with other tests.

Configure the test by entering different argument values other than the default ones.

Select tests you want to run using the Test Suite functionality.

Save all the tests logs, such SEL logs, to an external USB flash drive.

Probe the current state of the server and view hardware issues.

Table 7-1 describes when you should use a specific diagnostic functionality.

Table 7-1 Using Diagnostics

Diagnostic Component
Function

F7 option

Use this option to run a specific set of tests when the server is booting up. The components that are tested are memory, processor, cache, Smart disk, QPI, memory pattern, and RAID adapter.

Quick Test

Use this test when you want to quickly check the status of a subsystem within a stipulated period. The components that can be tested under the quick test are processor, cache, memory, disk, video, network, QPI, CIMC, RAID, and chipset.

Comprehensive Test

Use this test when you want to test a subsystem in detail. These tests are designed to stress the subsystems and report the error. The tests that can be run are processor, memory, QPI, disk, and NUMA.

Quick Tasks

Allows for consolidated testing of both comprehensive and quick tests. You can run both types of tests using quick tasks.

Test Suite

All the tests available under the quick and comprehensive test are available here. The test suite gives you an option to choose as many tests as you like (using a check box) and running them together.

Tests Log Summary

Use the test log summary to view the log, error log, and analysis of all the tests you have run. You can use four filters to sort the logs.

Tests Summary

This table on the left-hand navigation gives you the results of the tests you have run in the form of either passed tests, tests in queue or failed tests.


Using Diagnostic Tools

This section describes the procedures to use the diagnostic tool components and contains the following sections:

Using the F7 Diagnostic Option

Quick Test

Comprehensive Test

Quick Tasks

Tests Suite

Tests Log Summary

Non-Interactive Offline Diagnostics

Using the F7 Diagnostic Option

UCS-SCU provides you with an option to run a few pre-defined diagnostic tests on the server when it is booting. You can initiate these diagnostic tests by using the F7 option. This F7 option boots the SCU image available on the Secure Digital (SD) memory card and automatically runs a set of pre-defined diagnostic tests.

If there is no SD card available on the server, then you should have mapped the SCU image using vMedia. If you have not mapped the SCU image using vMedia, and if there is no SD card with an SCU image on the server, then these diagnostic tests cannot be completed. After the tests are completed, the SCU interface appears and displays the test results. The interface displays a progress report indicating diagnostic tests that have passed, failed and those that are queued for completion.


Note You can use this option only when the server is booting.


Quick Test

You can run these tests quickly to determine any hardware issue. These tests usually take 20-30 minutes to run and test limited functionality for a few subsystems. The comprehensive test provides more exhaustive diagnostics.

To run the quick test follow these steps:


Step 1 Click Diagnostic Tools from the left navigation pane.

Step 2 Click Tests.

Step 3 Click the Quick Test collapsible button to view the types of quick tests available for you to run.

Step 4 Click a subsystem (such as memory, video, or network).

Step 5 On the content pane, click Run Test.

The test is run and the status is displayed in the Tests Status area.

Table 7-2 describes the sub-systems covered under quick test.

Table 7-2 Quick Tests

Test
Description

Processor Test

Runs processor-specific tests. This test performs arithmetic and floating point operations on all available cores. You can also specify the duration of the tests.

Cache Test

Runs test to exercise the CPU caches and checks for correctable and uncorrectable cache errors.

Memory Test

Tests DIMMs and memory controllers.

Disk Test

Tests the available disks in the system by reading each disk block-by-block.

Video Test

Test to stress the video memory.

Network Test

Tests the available network interfaces by running the internal loopback test, register test, Electrically Erasable Programmable Read Only Memory (EEPROM) test and interrupt test.

QPI Test

Tests the quick path interconnect fabric.

CIMC Test

Runs CIMC self-test through the IPMI interface and also checks for SEL fullness.

Chipset Test

Runs a test to check the chipset for any errors logged in the chipset RAS registers.

RAID Adapter Test

Runs test to check the LSI MegaRAID 926x and 8708 controller and battery backup unit diagnostics.


Comprehensive Test

The Comprehensive test can run for hours and usually runs when quick tests cannot diagnose the issue with your server. The test is designed to test multiple hardware components and find issues that may be caused due to multiple components on your server.

The individual tests run can be customized to test some user-defined conditions. You can also select a group of tests to be run.

To run the comprehensive test, follow these steps:


Step 1 Click Diagnostic Tools from the left navigation pane.

Step 2 Click Tests.

Step 3 Click the Comprehensive Test collapsible button to view the types of comprehensive tests available for you to run.

Step 4 Click a subsystem (such as processor, memory, or network).

Step 5 On the content pane, click Run Tests.

The test is run and the status is displayed in the Tests Status area.

Table 7-3 describes the sub-systems covered under comprehensive tests 

Table 7-3 Comprehensive Tests

Test
Description

Processor Stress Test

Imposes maximum stress on CPU and memory on the system. You can set the time (in minutes) that you want this test to run for.

Memory Pattern Test

Tests the available free memory by writing and reading various patterns to the memory.

QPI Stress Test

Runs test to stress the QPI interconnect by generating traffic between the NUMA nodes.

Smart Disk Test

Tests the available disks in the system by reading each disk block by block.

NUMA Test

Runs test to stress the NUMA memory access patterns and check for errors.

VDisk Stress Test

Runs test to stress the virtual disks in the system. This test runs for a longer time, depending on the size of the virtual disk.


Quick Tasks

Quick Tasks allow you to get started with diagnostic tools immediately. You can run all the tests (Quick and Comprehensive) from here and report the details to Cisco to troubleshoot the logs and provide information about problems with your system. To use this feature, follow these steps:


Step 1 Click Diagnostic Tools from the left navigation pane.

Step 2 Click Quick Tasks.

Step 3 Select either Run Quick Tests or Run Comprehensive Test from the toolbar.

The status appears in the Test Status pane. You can also view detailed test results under Tests log summary.


Tests Suite

The Test Suite allows you to run the quick test and comprehensive test in a batch. It lists the various tests available, along with the test type and description of the test. You can select any number of tests you want to run from the list and view the result in the Tests Status column.

To run the test suite, follow these steps:


Step 1 Click Tests Suite from the left navigation pane.

Step 2 Select the tests you want to run by clicking the required check boxes.

Step 3 Click Run Tests Suite to run the tests you added to the test suite.

The status appears in the Tests Status pane along with the name, suite ID, Result, start time and end time. You can also view the Tests Log Summary to view the execution status of the tests in the test suite.


Tests Log Summary

Use the Tests Log Summary functionality to examine the test logs for troubleshooting. To view the Tests Log summary, follow these steps:


Step 1 Click Diagnostic Tools on the left navigation pane.

Step 2 Click Tests Log Summary on the left navigation pane.

Step 3 Select a filter from the filter drop-down and click Go. The status, result, start time, and end time of the test displays.

Step 4 Click a specific log entry (for example, click memory test) for more details.

The Log, Error Log (if the test failed), and the analysis of the specific test appears in the content pane.


Tests Summary

The Test Summary table in the left navigation area provides you with a quick view of the tests that have passed, tests in queue and tests that have failed.

Non-Interactive Offline Diagnostics

Cisco UCS C-series servers with CIMC version 1.5(2) or later support using an XMLAPI interface for running server snapshot tests without any manual intervention. You can use an XMLAPI client to run the server snapshot process on a C-Series server and copy the resulting server snapshot output to another machine (Windows or UNIX) using either SFTP or SCP unattended. Non-Interactive Offline Diagnostics can be run simultaneously on multiple C-series servers with logs archived automatically on a remote server.

For more information on using the XML API, see the Cisco UCS Rack-Mount Servers CIMC XML API Programmer's Guide available at:

http://www.cisco.com/en/US/docs/unified_computing/ucs/c/sw/api/b_cimc_api_book.html