UCS Fault Suppression

Global Fault Policy

The global fault policy controls the lifecycle of a fault in a Cisco UCS domain, including when faults are cleared, the flapping interval (the length of time between the fault being raised and the condition being cleared), and the retention interval (the length of time a fault is retained in the system).

A fault in Cisco UCS has the following lifecycle:

  1. A condition occurs in the system and Cisco UCS Manager raises a fault. This is the active state.

  2. When the fault is alleviated, it enters a flapping or soaking interval that is designed to prevent flapping. Flapping occurs when a fault is raised and cleared several times in rapid succession. During the flapping interval, the fault retains its severity for the length of time specified in the global fault policy.

  3. If the condition reoccurs during the flapping interval, the fault returns to the active state. If the condition does not reoccur during the flapping interval, the fault is cleared.

  4. The cleared fault enters the retention interval. This interval ensures that the fault reaches the attention of an administrator even if the condition that caused the fault has been alleviated and the fault has not been deleted prematurely. The retention interval retains the cleared fault for the length of time specified in the global fault policy.

  5. If the condition reoccurs during the retention interval, the fault returns to the active state. If the condition does not reoccur, the fault is deleted.

Configuring the Global Fault Policy

Procedure


Step 1

In the Navigation pane, click Admin.

Step 2

Expand All > Faults, Events, and Audit Log.

Step 3

Click Settings.

Step 4

In the Work pane, click the Global Fault Policy tab.

Step 5

In the Global Fault Policy tab, complete the following fields:

Name Description

Flapping Interval field

Flapping occurs when a fault is raised and cleared several times in rapid succession. To prevent this, Cisco UCS Manager does not allow a fault to change its state until this amount of time has elapsed since the last state change.

If the condition reoccurs during the flapping interval, the fault returns to the active state. If the condition does not reoccur during the flapping interval, the fault is cleared. What happens at that point depends on the setting in the Clear Action field.

Enter an integer between 5 and 3,600. The default is 10.

Initial Severity field

This can be one of the following:

  • Info

  • Condition

  • Warning

Action on Acknowledgment field

Acknowledged actions are always deleted when the log is cleared. This option cannot be changed.

Clear Action field

The action Cisco UCS Manager takes when a fault is cleared. This can be one of the following:

  • RetainCisco UCS Manager GUI displays the Length of time to retain cleared faults section.

  • DeleteCisco UCS Manager immediately deletes all fault messages as soon as they are marked as cleared.

Clear Interval field

Indicate whether Cisco UCS Manager automatically clears faults after a certain length of time. This can be one of the following:

  • NeverCisco UCS Manager does not automatically clear any faults.

  • otherCisco UCS Manager GUI displays the dd:hh:mm:ss field.

dd:hh:mm:ss field

The number of days, hours, minutes, and seconds that should pass before Cisco UCS Manager automatically marks that fault as cleared. What happens then depends on the setting in the Clear Action field.

Step 6

Click Save Changes.


What to do next

For more information on fault suppression, see the Cisco UCS System Monitoring Guide.