Onboard Failure Logging

Onboard Failure Logging (OBFL) gathers boot, environmental, and critical hardware data for field-replaceable units (FRUs), and stores the information in the nonvolatile memory of the FRU. This information is used for troubleshooting, testing, and diagnosis if a failure or other error occurs, providing improved accuracy in hardware troubleshooting and root cause isolation analysis. Stored OBFL data can be retrieved in the event of a failure and is accessible even if the card does not boot.

Because OBFL is on by default, data is collected and stored as soon as the card is installed. If a problem occurs, the data can provide information about historical environmental conditions, uptime, downtime, errors, and other operating conditions.

The Onboard Failure Logging (OBFL) functionality is enhanced to provide a generic library that can be used by different clients to log string messages.


Caution


OBFL is activated by default in FRUs and can be deactivated by stopping the obflmgr process. Do not deactivate OBFL without specific reasons, because the OBFL data is used to diagnose and resolve problems in FRUs.


Prerequisites

You must be in a user group associated with a task group that includes the proper task IDs. The command reference guides include the task IDs required for each command. If you suspect user group assignment is preventing you from using a command, contact your AAA administrator for assistance.

Information About OBFL

OBFL is enabled by default. OBFL collects and stores both baseline and event- driven information in the nonvolatile memory of each supported card where OBFL is enabled. The data collected includes the following:

  • Boot time

  • FRU part serial number

  • OS version

  • Temperature and voltage at boot

  • Temperature and voltage history

  • Total run time

This data is collected in two different ways as baseline data and event- driven data.

Baseline Data Collection

Baseline data is stored independent of hardware or software failures and includes the information given in the following table.

Table 1. Data Types for Baseline Data Collection

Data Type

Details

Installation

Chassis serial number and slot number are stored at initial boot.

Temperature

Information on temperature sensors is recorded after boot. The subsequent recordings are specific to variations based on preset thresholds.

Run-time

Total run-time is limited to the size of the history buffer used for logging. This is based on the local router clock with logging granularity of 30 minutes.

Event-Driven Data Collection

Event driven data include card failure events. Failure events are card crashes, memory errors, ASIC resets, and similar hardware failure indications.

Table 2. Data Types for Event-Driven Data Collection

Details

Environmental Factors

Temperature Value

Inlet and hot point temperature value change beyond the threshold set in the hardware inventory XML files.

Voltage Value

An environmental reading is logged when the following temperature or voltage events occur:

  • Exceed the normal range

  • Change more than 10%

  • Return within range for more than five minutes.

On reboot, these environmental readings are consolidated into a single environmental history record that shows the duration and extent out of normal range for a consecutive set of environmental readings.

Calendar Time

Cleared

The time when OBFL logging was cleared.

Disabled

The time when OBFL logging was disabled.

Reset to 0

The time when total line card runtime is reset to zero.

Supported Cards and Platform

FRUs that have sufficient nonvolatile memory available for OBFL data storage support OBFL. The following table provides information about the OBFL support for different FRUs on the Cisco NCS 5500 Series router.

Table 3. OBFL Support on Cisco NCS 5500 Series Router

Card Type

Cisco NCS 5500 Series Router

Route processor

Supported

Fabric cards

Supported

Line card

Supported

Power supply cards

Not Supported

Fan tray

Supported

System Controller

Supported