Overview of Faults
This section includes the following topics:
About Faults in FXOS
A fault is a mutable object that is managed by the Cisco Firepower eXtensible Operating System. Each fault represents a failure or an alarm threshold that has been raised. During the lifecycle of a fault, it can change from one state or severity to another.
Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, then the object transitions to a functional state.
A fault remains in FXOS until the fault is cleared and deleted according to the settings in the fault collection policy.
You can view all faults from either the FXOS CLI or the Firepower Chassis Manager. You can also configure the fault collection policy to determine how a FXOS instance collects and retains faults.
Note |
All Cisco Firepower eXtensible Operating System faults can be trapped by SNMP. |
Fault Severities
A fault can transition through more than one severity during its lifecycle. The following table describes the possible fault severities in alphabetical order.
Severity |
Description |
---|---|
Cleared |
A notification that the condition that caused the fault has been resolved, and the fault has been cleared. |
Condition |
An informational message about a condition, possibly independently insignificant. |
Critical |
A service-affecting condition that requires immediate corrective action. For example, this severity could indicate that the managed object is out of service and its capability must be restored. |
Info |
A basic notification or informational message, possibly independently insignificant. |
Major |
A service-affecting condition that requires urgent corrective action. For example, this severity could indicate a severe degradation in the capability of the managed object and that its full capability must be restored. |
Minor |
A non-service-affecting fault condition that requires corrective action to prevent a more serious fault from occurring. For example, this severity could indicate that the detected alarm condition is not currently degrading the capacity of the managed object. |
Warning |
A potential or impending service-affecting fault that currently has no significant effects in the system. Action should be taken to further diagnose, if necessary, and correct the problem to prevent it from becoming a more serious service-affecting fault. |
Fault Types
A fault can be one of the types described in the following table.
Type |
Description |
---|---|
fsm |
An FSM task has failed to complete successfully, or the FXOS is retrying one of the stages of the FSM. |
equipment |
FXOS has detected that a physical component is inoperable or has another functional issue. |
server |
FXOS is unable to complete a server task, such as associating a service profile with a server. |
configuration |
FXOS is unable to successfully configure a component. |
environment |
FXOS has detected a power problem, thermal problem, voltage problem, or a loss of CMOS settings. |
management |
FXOS has detected a serious management issue, such as one of the following:
|
connectivity |
FXOS has detected a connectivity problem, such as an unreachable adapter. |
network |
FXOS has detected a network issue, such as a link down. |
operational |
FXOS has detected an operational problem, such as a log capacity issue or a failed server discovery. |
Properties of Faults
FXOS provides detailed information about each fault raised on the security appliance. The following table describes the fault properties that can be viewed in the FXOS CLI or the Firepower Chassis Manager.
Property Name |
Description |
---|---|
Severity |
The current severity level of the fault. This can be any of the severities described in Table 1. |
Last Transition |
The day and time on which the severity for the fault last changed. If the severity has not changed since the fault was raised, this property displays the original creation date. |
Affected Object |
The component that is affected by the condition that raised the fault. |
Description |
The description of the fault. |
ID |
The unique identifier assigned to the fault. |
Status |
Additional information about the fault state. This can be any of the states described in Table 1. |
Type |
The type of fault that has been raised. This can be any of the types described in Table 1. |
Cause |
The unique identifier associated with the condition that caused the fault. |
Created at |
The day and time when the fault occurred. |
Code |
The unique identifier assigned to the fault. |
Number of Occurrences |
The number of times the event that raised the fault occurred. |
Original Severity |
The severity assigned to the fault on the first time that it occurred. |
Previous Severity |
If the severity has changed, this is the previous severity. |
Highest Severity |
The highest severity encountered for this issue. |
Lifecycle of Faults
FXOS faults are stateful, and a fault transitions through more than one state during its lifecycle. In addition, only one instance of a given fault can exist on each object. If the same fault occurs a second time, FXOS increases the number of occurrences by one.
A fault has the following lifecycle:
-
A condition occurs in the system and FXOS raises a fault in the active state.
-
If the fault is alleviated within a short period of time know as the flap interval, the fault severity remains at its original active value but the fault enters the soaking state. The soaking state indicates that the condition that raised the fault has cleared, but the system is waiting to see whether the fault condition reoccurs.
-
If the condition reoccurs during the flap interval, the fault enters the flapping state. Flapping occurs when a fault is raised and cleared several times in rapid succession. If the condition does not reoccur during the flap interval, the fault is cleared.
-
Once cleared, the fault enters the retention interval. This interval ensures that the fault reaches the attention of an administrator even if the condition that caused the fault has been alleviated, and that the fault is not deleted prematurely. The retention interval retains the cleared fault for the length of time specified in the fault collection policy.
-
If the condition reoccurs during the retention interval, the fault returns to the active state. If the condition does not reoccur, the fault is deleted.
When a fault is active, the additional lifecycle state information listed in the following table may be provided in the Status field of the fault notification.
State |
Description |
---|---|
Soaking |
A fault was raised and then cleared within a short time known as the flap interval. Since this may be a flapping condition, the fault severity remains at its original active value, but this state indicates that the condition that raised the fault has cleared. If the fault does not reoccur, the fault moves into the cleared state. Otherwise, the fault moves into the flapping state. |
Flapping |
A fault was raised, cleared, and then raised again within a short time known as the flap interval. |
Fault Collection Policy
The fault collection policy controls the lifecycle of a fault, including the length of time that each fault remains in the flapping and retention intervals.
Faults in Cisco Firepower eXtensible Operating System
This section includes the following topics:
Faults in Firepower Chassis Manager
To view the faults for all objects in the system, navigate to the Overview page in the Firepower Chassis Manager. Each fault severity is represented by a different icon. Above the fault listing you can see how many critical and major faults have occurred in the system. When you double-click a specific fault, Firepower Chassis Manager opens the Faults Properties dialog box and displays details for that fault.
Faults in FXOS CLI
If you want to view the faults for all objects in the system, at the top-level scope, enter the show fault command. If you want to view faults for a specific object, scope to that object and then enter the show fault command.
If you want to view all of the available details about a fault, enter the show fault detail command.