Faults, Errors, Events, Audit Logs
Note |
For information about faults, events, errors, and system messages, see the Cisco APIC Faults, Events, and System Messages Management Guide and the Cisco APIC Management Information Model Reference, a Web-based application. |
The APIC maintains a comprehensive, current run-time representation of the administrative and operational state of the ACI Fabric system in the form of a collection of MOs. The system generates faults, errors, events, and audit log data according to the run-time state of the system and the policies that the system and user create to manage these processes.
The APIC GUI enables you to create customized "historical record groups" of fabric switches, to which you can then assign customized switch policies that specify customized size and retention periods for the audit logs, event logs, health logs, and fault logs maintained for the switches in those groups.
The APIC GUI also enables you to customize a global controller policy that specifies size and retention periods for the audit logs, event logs, health logs, and fault logs maintained for the controllers on this fabric.
Faults
Based on the run-time state of the system, the APIC automatically detects anomalies and creates fault objects to represent them. Fault objects contain various properties that are meant to help users diagnose the issue, assess its impact and provide a remedy.
A life cycle represents the current state of the issue. It starts in the soak time when the issue is first detected, and it changes to raised and remains in that state if the issue is still present. When the condition is cleared, it moves to a state called "raised-clearing" in which the condition is still considered as potentially present. Then it moves to a "clearing time" and finally to "retaining". At this point, the issue is considered to be resolved and the fault object is retained only to provide the user visibility into recently resolved issues.
Each time that a life-cycle transition occurs, the system automatically creates a fault record object to log it. Fault records are never modified after they are created and they are deleted only when their number exceeds the maximum value specified in the fault retention policy.
The severity is an estimate of the impact of the condition on the capability of the system to provide service. Possible values are warning, minor, major and critical. A fault with a severity equal to warning indicates a potential issue (including, for example, an incomplete or inconsistent configuration) that is not currently affecting any deployed service. Minor and major faults indicate that there is potential degradation in the service being provided. Critical means that a major outage is severely degrading a service or impairing it altogether. Description contains a human-readable description of the issue that is meant to provide additional information and help in troubleshooting.
The following figure shows the process for fault and events reporting:
-
Process detects a faulty condition.
-
Process notifies Event and Fault Manager.
-
Event and Fault Manager processes the notification according to the fault rules.
-
Event and Fault Manager creates a fault Instance in the MIM and manages its life cycle according to the fault policy.
-
Event and Fault Manager notifies the APIC and connected clients of the state transitions.
-
Event and Fault Manager triggers further actions (such as syslog or call home).
Log Record Objects
About the Log Record Objects
All of the events in the Cisco Application Centric Infrastructure (ACI) fabric, such as faults being generated, faults being cleared, events on the Cisco Application Policy Infrastructure Controller (APIC) or switches, and so on, are recorded in the database so that users can review the historical status transitions, events, and so on. Both the Cisco APIC and switch nodes generate and store faults, events, and so on by themselves. However, log records from the switch nodes are also duplicated on the Cisco APICs so that you can view the log records of the entire fabric, including the Cisco APIC nodes and switch nodes, from the Cisco APICs. In addition, the Cisco APIC database retains the log records of both the Cisco APIC nodes and the switch nodes even after you upgrade the Cisco APIC. In contrast, the log records are lost on a switch when you upgrade the switch.
A log record object is created by the system and cannot be modified nor deleted by a user. The lifecycle of a log record object is controlled by a retention policy. When the number of log record objects per class reaches the maximum limit in the retention policy, the oldest log record objects are purged from database to make room for the new records.
The log record objects are divided into the following log record classes:
-
Fault Records: Fault records show the history of lifecycle change. A fault rule is defined on a managed object class. When a managed object has a faulty condition, a fault is raised and becomes associated with the managed object. When the faulty condition is gone, the fault is cleared. Every time a fault is raised or cleared or the lifecycle state is changed, a fault record object is created to record the change of the fault state.
-
Event Records: These are events managed by the Cisco APIC. Each event record represents an event that occurred on the switches or on the Cisco APIC nodes. An event rule is defined on a managed object class. When a managed object state matches the event rule, an event (or
eventRecord
object) is created. For example, if you unplug a card from a switch, the switch event manager generates an event notification for the user operation. -
Audit Logs: Audit logs are historic records that log when a managed object is changed and includes which user made the change. Audit logs also log those managed objects changed by the system internally.
-
Session Logs: Session logs are historic records that log when a user logged in or out of the Cisco APIC or a switch and includes the client's IP address.
-
Health Records: Health records are historical records of the health score change on a managed object. Every time a managed object's health score changes by 5 points, a health record object is created.
The maximum number of each log record objects in the fabric is defined by the retention policy, which could be millions across the fabric. When querying such huge amounts data, the response to the query may become slow and in the worst case the query may fail. To prevent that, beginning with Cisco APIC release 5.1(1), the reader process was enhanced specifically for log record objects to respond much faster. However, as a trade off, the sorting across queries (pages) is not guaranteed.
The query performance improvement and the new limitation apply only to the query for the log record objects mentioned in this section.
Beginning with Cisco APIC release 5.2(3), with the new API query option time-range
that is supported only for log record objects, the Cisco APIC can respond to the API query for the log record objects much faster while maintaining sorting across pages. The Cisco APIC GUI also uses the time-range
option for improved performance and sorting. For more information about querying log record objects, see the Cisco APIC REST API Configuration Guide, Release 4.2(x) and Later.
Viewing the Log Record Objects Using the GUI
You can use the Cisco Application Policy Infrastructure Controller (APIC) GUI to view the log record objects from the database of either the Cisco APIC or a switch. Beginning with the 5.2(3) release, use one of the following methods to view the log record objects:
-
For all Cisco APICs and switches in the fabric, go to the tab, then choose one of the log record tabs in the Work pane.
-
For a specific switch, go to the Navigation pane, go to . In the Work pane, choose the History tab, then choose one of the log record subtabs.
tab. In the
The records are displayed in descending order based on the created time and date. You can narrow the displayed log records based on a time period by clicking the down arrow to the right of the History within the last x time_measurement and choosing a time period. The custom choice enables you to specify any range of dates.
You can also narrow the displayed log records by creating one or more filters. Click in the Filter by attributes field, choose an attribute, choose an operator, then choose or enter a value (depending on the attribute). Repeat this process for each filter that you want to create.
Alternatively, hover over a value in the table of records, which causes a filter icon to appear (represented by a funnel)
to the right of the value, then click the icon. Doing so automatically creates a filter with the appropriate parameters. For
example, if you are viewing the fault records and you click the filter icon for fault code F103824, a filter is created with
the following parameters: Code == F103824
. The automatically-created filter only supports the ==
operator.
Use the Rows drop-down list at the bottom of the Work pane to choose how many records you want to view per page. Higher Rows values might result in a slower GUI response time. The Rows value resets to the default of 10 if you click on a different log record class.
The Actions menu enables you to perform the following action:
-
Download All: Downloads all of the records of the chosen class to your local system. The time range and filters that you specified are ignored. You can download the records as an XML or JSON file.
If you are viewing the log record objects from the
tab, you can click the 3 dots at the right end of a row to perform additional actions with that specific record. For the event records, the possible actions are as follows:-
Change Severity: Changes the severity of the event to the severity that you choose. All new events with the same event code will also have the chosen severity. The severity of all other existing events with the same event code are not changed.
-
Ignore Event: The event will no longer be displayed, and all new events with the same event code will not be displayed. All other existing events with the same event code continue to be displayed.
-
Open in Object Store Browser: Opens the specific record in the Object Store Browser in a new Web browser tab.
-
Save As: Downloads the specific record to your local system. You can download the records as an XML or JSON file.
For all other log record classes, the possible actions are as follows:
-
Open in Object Store Browser: Opens the specific record in the Object Store Browser in a new Web browser tab.
-
Save As: Downloads the specific record to your local system. You can download the records as an XML or JSON file.
Errors
APIC error messages typically display in the APIC GUI and the APIC CLI. These error messages are specific to the action that a user is performing or the object that a user is configuring or administering. These messages can be the following:
-
Informational messages that provide assistance and tips about the action being performed
-
Warning messages that provide information about system errors related to an object, such as a user account or service profile, that the user is configuring or administering
-
Finite state machine (FSM) status messages that provide information about the status of an FSM stage
Many error messages contain one or more variables. The information that the APIC uses to replace these variables depends upon the context of the message. Some messages can be generated by more than one type of error.