Requirements and Prerequisites for Health Monitoring
Model Support
Any
Supported Domains
Any
User Roles
Admin
Maintenance User
The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
The following topics describe how to use health monitoring:
Any
Any
Admin
Maintenance User
The health monitor on the management center tracks various health indicators to ensure that the hardware and software in the system are working correctly. You can use the health monitor to check the status of critical functionality across your deployment.
You can configure the frequency for running the health modules for alerting. The Management Center also supports time series data collection. You can configure the frequency of collecting the time series data on the device and its health modules. The device monitor reports these metrics in several predefined health monitor dashboards by default. The metric data is collected for analysis and hence no alerting is associated with it.
You can use the health monitor to create a collection of tests, referred to as a health policy, and apply the health policy to one or more appliances. The tests, referred to as health modules, are scripts that test for the criteria you specify. You can modify a health policy by enabling or disabling tests or by changing test settings, and you can delete health policies that you no longer need. You can also suppress messages from selected appliances by excluding them.
The health monitoring system run the tests in a health policy at the configured intervals. You can also run all tests, or a specific test, on demand. The health monitor collects health events based on the test conditions configured.
The health modules are of two types: legacy-based and telegraph-based.
The legacy-based health module monitors the health status of certain systems, such as fans, power supplies, and database integrity. When the conditions specified in the health policy for these monitored systems are met, the legacy infrastructure-based health modules directly raise alerts (green, red, or orange) with a short message.
The telegraph-based health module monitors the telegraph plug-ins that retrieve metric information of the monitored system. You can create custom dashboards with your preferred health metrics for telegraph-based health module, allowing you to monitor specific statistics or troubleshoot specific issues.
Note |
All appliances automatically report their hardware status via the Hardware Alarms health module. The management center also automatically reports status using the modules configured in the default health policy. Some health modules, such as the Appliance Heartbeat module, run on the management center and report the status of the management center's managed devices. For the health modules to provide managed device status, you must deploy all health policies to the device. |
You can use the health monitor to access health status information for the entire system, for a particular appliance, or, in a multi-domain deployment, a particular domain. Hexagon charts and status tables on the Health Monitor page provide a visual summary of the status of all appliances on your network, including the management center. Individual appliance health monitors let you drill down into health details for a specific appliance.
Fully customizable event views allow you to quickly and easily analyze the health status events gathered by the health monitor. These event views allow you to search and view event data and to access other information that may be related to the events you are investigating. For example, if you want to see all the occurrences of CPU usage with a certain percentage, you can search for the CPU usage module and enter the percentage value.
You can also configure email, SNMP, or syslog alerting in response to health events. A health alert is an association between a standard alert and a health status level. For example, if you want to make sure an appliance never fails due to hardware overload, you can set up an email alert. You can then create a health alert that triggers an email alert whenever CPU, disk, or memory usage reaches the Warning level you configure in the health policy applied to that appliance. You can set alerting thresholds to minimize the number of repeating alerts you receive.
Note |
The health monitoring can take 5–6 minutes from the occurrence of the health event to generate the health alert. |
You can also generate troubleshooting files for an appliance if you are asked to do so by Support.
Only users with administrator user role privileges can access system health data.
In a management center high-availability deployment running Version 6.7 or higher, the active management center creates a health monitor page that uses REST APIs to show detailed metric-based information. The standby management center creates the health monitor page that shows the alert information and provide a visual summary of the status of all appliances on your network using pie charts and status tables. The standby management center does not display the metric-based information.
Health modules, or health tests, test for the criteria you specify in a health policy.
Module |
Module Type |
Description |
||
---|---|---|---|---|
CPU Usage (per core) |
Telegraph |
This module checks that the CPU usage on all the cores is not overloaded and alerts when CPU usage exceeds the thresholds
configured for the module. The Warning Threshold % default value is |
||
Disk Status |
Legacy |
This module examines the performance of the hard disk and malware storage pack (if installed) on the appliance. This module generates a Warning (yellow) health alert when the hard disk and RAID controller (if installed) are in danger of failing, or if an additional hard drive is installed that is not a malware storage pack. This module generates an Alert (red) health alert when an installed malware storage pack cannot be detected. |
||
Disk Usage |
Telegraph |
This module compares disk usage on the appliance’s hard drive and malware storage pack to the limits configured for the module and alerts when usage exceeds the thresholds configured for the module. This module also alerts when the system excessively deletes files in monitored disk usage categories, or when disk usage excluding those categories reaches excessive levels, based on module thresholds. See Disk Usage and Drain of Events Health Monitor Alerts for information about troubleshooting scenarios for Disk Usage alerts. The Disk Usage module sends a health alert if the size of device configuration history files exceeds the allowed limit. See Disk Usage for Device Configuration History Files Health Monitoring Alert for information about troubleshooting scenarios for the disk usage alerts. This health alert is not supported on Secure Firewall Management Center Versions 7.2.0-7.2.5, 7.3.x, and 7.4.0. Use the Disk Usage health status module to monitor disk usage for the
|
||
File System Integrity Check |
Legacy |
This module performs a file system integrity check and runs if the system has CC mode or UCAPL mode enabled, or if the system runs an image signed with a DEV key. This module is enabled by default. |
||
Health Monitor Process |
Legacy |
This module monitors the status of the health monitor itself and alerts if the number of minutes since the last health event received by the management center exceeds the Warning or Critical limits. |
||
Interface Status |
Legacy |
This module determines if the device currently collects traffic and alerts based on the traffic status of physical interfaces and aggregate interfaces. For physical interfaces, the information includes interface name, link state, and bandwidth. For aggregate interfaces, the information includes interface name, number of active links, and total aggregate bandwidth.
|
||
Local Malware Analysis |
Legacy |
This module monitors ClamAV updates for Local Malware Analysis. |
||
Memory Usage |
Legacy |
This module compares memory usage on the appliance to the limits configured for the module and alerts when usage exceeds the levels configured for the module. For appliances with more than 4 GB of memory, the preset alert thresholds are based on a formula that accounts for proportions
of available memory likely to cause system problems. On >4 GB appliances, because the interval between Warning and Critical
thresholds may be very narrow, its recommended that you manually set the Warning Threshold % value to Beginning with Version 6.6.0, the minimum required RAM for management center virtual upgrades to Version 6.6.0+ is 28 GB, and the recommended RAM for management center virtual deployments is 32 GB. We recommend you do not decrease the default settings: 32 GB RAM for most management center virtual instances, 64 GB for the management center virtual 300 (VMware only).
Complex access control policies and rules can command significant resources and negatively affect performance. |
||
Process Status |
Legacy |
This module determines if processes on the appliance exit or terminate outside of the process manager. If a process is deliberately exited outside of the process manager, the module status changes to Warning and the health event message indicates which process exited, until the module runs again and the process has restarted. If a process terminates abnormally or crashes outside of the process manager, the module status changes to Critical and the health event message indicates the terminated process, until the module runs again and the process has restarted. |
||
Threat Data Updates on Devices |
Legacy |
Certain intelligence data and configurations that devices use to detect threats are updated on the management center from the cloud every 30 minutes. This module alerts you if this information has not been updated on the devices within the time period you have specified. Monitored updates include:
By default, this module sends a warning after 1 hour and a critical alert after 24 hours. If this module indicates failure on the management center or on any devices, verify that the management center can reach the devices. |
Module |
Module Type |
Description |
||
---|---|---|---|---|
AMP for Endpoints Status |
Legacy |
The module alerts if the management center cannot connect to the AMP cloud or Cisco AMP Private Cloud after an initial successful connection, or if the private cloud cannot contact the public AMP cloud. It also alerts if you deregister an AMP cloud connection using the Secure Endpoint management console. |
||
AMP for Firepower Status |
Legacy |
This module alerts if:
If your management center loses connectivity to the Internet, the system may take up to 30 minutes to generate a health alert. |
||
Appliance Heartbeat |
Legacy |
This module determines if an appliance heartbeat is being heard from the appliance and alerts based on the appliance heartbeat status. |
||
Database Size |
Legacy |
This module checks the size of the configuration database and alerts when the size exceeds the values (in gigabytes) configured for the module. |
||
Discovery Host Limit |
Legacy |
This module determines if the number of hosts the management center can monitor is approaching the limit and alerts based on the warning level configured for the module. For more information, see Host Limit. |
||
Event Backlog Status |
Legacy |
This module alerts if the backlog of event data awaiting transmission from the device to the management center has grown continuously for more than 30 minutes. To reduce the backlog, evaluate your bandwidth and consider logging fewer events. |
||
Event Monitor |
Telegraph |
This module monitors overall incoming event rate to management center. |
||
Event Stream Status |
Legacy |
This module monitors connections to third-party client applications that use the Event Streamer on the management center. |
||
ISE Connection Monitor |
Legacy |
This module monitors the status of the server connections between the Cisco Identity Services Engine (ISE) and the management center. ISE provides additional user data, device type data, device location data, SGTs (Security Group Tags), and SXP (Security Exchange Protocol) services. |
||
License Monitor |
Legacy |
This module monitors license expiration. |
||
Management Center HA Status |
Legacy |
This module monitors and alerts on the high availability status of the management center. If you have not established management center high availability, the HA Status is Not in HA.
|
||
MySQL Statistics |
Telegraph |
This module monitors the status of the MySQL database, including the database size, number of active connections, and memory use. Disabled by default. |
||
RabbitMQ Status |
Telegraph |
This module collects various statistics for RabbitMQ. |
||
RRD Server Process |
Legacy |
This module determines if the round robin data server that stores time series data is running properly. The module alerts if the RRD server has restarted since the last time it updated; it enters Critical or Warning status if the number of consecutive updates with an RRD server restart reaches the numbers specified in the module configuration. |
||
Realm |
Legacy |
Enables you to set a warning threshold for realm or user mismatches, which are:
For more information, Cisco Secure Firewall Management Center Device Configuration Guide. This module also displays health alerts when you try to download more users than the maximum number of downloaded users supported per realm. The maximum number of downloaded users for a single realm depends on your management center model. For more information, see User Limit in the Cisco Secure Firewall Management Center Device Configuration Guide |
||
Security Intelligence |
Legacy |
This module alerts if Security Intelligence is in use and the management center cannot update a feed, or feed data is corrupt or contains no recognizable IP addresses. See also the Threat Data Updates on Devices module. |
||
Smart License Monitor |
Legacy |
This module monitors Smart Licensing status and alerts if:
|
||
Sybase Statistics |
Telegraph |
This module monitors the status of the Sybase database on the management center, including the database size, number of active connections, and memory use. |
||
Time Series Data (RRD) Monitor |
Legacy |
This module tracks the presence of corrupt files in the directory where time series data (such as correlation event counts) are stored and alerts when files are flagged as corrupt and removed. |
||
Time Server Status |
Legacy |
This module monitors the configuration of the NTP servers and alerts when the NTP server is unavailable or if the NTP server configuration is invalid. If you receive critical alert from this module, choose System () and check the configuration of the NTP server specified in the alert. |
||
Time Synchronization Status |
Legacy |
This module tracks the synchronization of a device clock that obtains time using NTP with the clock on the NTP server and alerts if the difference in the clocks is more than ten seconds. |
||
Unresolved Groups Monitor |
Legacy |
Monitors unresolved groups used in policies. |
||
URL Filtering Monitor |
Legacy |
This module alerts if the management center fails to:
You can configure time thresholds for these alerts. See also the Threat Data Updates on Devices module. |
||
VPN Status |
Legacy |
This module alerts when one or more VPN tunnels between threat defense devices are down. This module tracks:
|
Module |
Module Type |
Description |
---|---|---|
AMP Connection Status |
Telegraph |
The module alerts if the threat defense cannot connect to the AMP cloud or Cisco AMP Private Cloud after an initial successful connection, or if the private cloud cannot contact the public AMP cloud. Disabled by default. |
AMP Threat Grid Connectivity |
Telegraph |
The module alerts if the threat defense cannot connect to the AMP Threat Grid cloud after an initial successful connection. |
ASP Drop |
Telegraph |
This module monitors the connections dropped by the data plane accelerated security path. |
Automatic Application Bypass |
Legacy |
This module monitors bypassed detection applications. |
Chassis Environment Status |
Legacy |
This module monitors chassis parameters such as fan speed and chassis temperature, and enables you to set a warning threshold
and critical threshold for temperature. The Critical Chassis Temperature (Celsius) default value is |
Cluster/HA Failover Status |
Legacy |
This module monitors the status of device clusters. The module alerts if:
|
Configuration Resource Utilization |
Legacy |
This module alerts if the size of your deployed configurations puts a device at risk of running out of memory. The alert shows you how much memory your configurations require, and by how much this exceeds the available memory. If this happens, re-evaluate your configurations. Most often you can reduce the number or complexity of access control rules or intrusion policies. Snort Memory Allocation
|
Connection Statistics |
Telegraph |
This module monitors the connection statistics and NAT translation counts. |
Data Plane CPU Usage |
Telegraph |
This module checks that the average CPU usage of all data plane processes on the device is not overloaded and alerts when
CPU usage exceeds the percentages configured for the module. The Warning Threshold % default value is |
Snort CPU Usage |
Telegraph |
This module checks that the average CPU usage of the Snort processes on the device is not overloaded and alerts when CPU usage
exceeds the percentages configured for the module. The Warning Threshold % default value is |
System CPU Usage |
Telegraph |
This module checks that the average CPU usage of all system processes on the device is not overloaded and alerts when CPU
usage exceeds the percentages configured for the module. The Warning Threshold % default value is |
Critical Process Statistics |
Telegraph |
This module monitors the state of critical processes, their resource consumption, and the restart counts. |
Deployed Configuration Statistics |
Telegraph |
This module monitors statistics about the deployed configuration, such as the number of ACEs and IPS rules. |
Firewall Threat Defense Platform Faults |
Legacy |
This module generates an alert for platform faults for Firepower 1000, 2100 and Secure Firewall 3100 devices. A fault is a mutable object that is managed by the management center. Each fault represents a failure in the threat defense instance or an alarm threshold that has been raised. During the lifecycle of a fault, it can change from one state or severity to another. Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, then the object transitions to a functional state. For more information, see the Cisco Firepower 1000/2100 FXOS Faults and Error Messages Guide. |
Management Center Access Configuration Changes |
Legacy |
This module monitors access configuration changes made on the management center directly using the configure network management-data-interface command. |
Flow Offload Statistics |
Telegraph |
This module monitors hardware flow offload statistics for a managed device. |
Hardware Alarms |
Legacy |
This module determines if hardware needs to be replaced on a physical managed device and alerts based on the hardware status. The module also reports on the status of hardware-related daemons. |
Inline Link Mismatch Alarms |
Legacy |
This module monitors the ports associated with inline sets and alerts if the two interfaces of an inline pair negotiate different speeds. |
Intrusion and File Event Rate |
Legacy |
This module compares the number of intrusion events per second to the limits configured for this module and alerts if the limits are exceeded. If the Intrusion and File Event Rate is zero, the intrusion process may be down or the managed device may not be sending events. Select to check if events are being received from the device. Typically, the event rate for a network segment averages 20 events per second. For a network segment with this average rate,
Events per second (Critical) should be set to
The maximum number of events you can set for either limit is 999, and the Critical limit must be higher than the Warning limit. |
Link State Propagation |
Legacy |
ISA 3000 only. This module determines when a link in a paired inline set fails and triggers the link state propagation mode. If a link state propagates to the pair, the status classification for that module changes to Critical and the state reads:
where |
Memory Usage Data Plane |
Telegraph |
This module checks the percentage of allocated memory used by the Data Plane processes and alerts when memory usage exceeds
the percentages configured for the module. The Warning Threshold % default value is |
Memory Usage Snort |
Telegraph |
This module checks the percentage of allocated memory used by the Snort process and alerts when memory usage exceeds the percentages
configured for the module. The Warning Threshold % default value is |
Network Card Reset |
Legacy |
This module checks for network cards which have restarted due to hardware failure and alerts when a reset occurs. |
NTP Statistics |
Telegraph |
This module monitors the NTP clock synchronization status of the managed device. Disabled by default. |
Power Supply |
Legacy |
This module determines if power supplies on the appliance require replacement and alerts based on the power supply status. |
Routing Statistics |
Telegraph |
This module monitors the current state of routing table. |
Snort3 Statistics |
Telegraph |
This module collects and monitors the Snort 3 statistics for events, flows, and packets. |
Snort Identity Memory Usage |
Legacy |
Enables you to set a warning threshold for Snort identity processing and alerts when memory usage exceeds the level configured
for the module. The Critical Threshold % default value is This health module specifically keeps track of the total space used for the user identity information in Snort. It displays the current memory usage details, the total number of user-to-IP bindings, and user-group mapping details. Snort records these details in a file. If the memory usage file is not available, the Health Alert for this module displays Waiting for data. This could happen during a Snort restart due to a new install or a major update, switch from Snort 2 to Snort 3 or back, or major policy deployment. Depending on the health monitoring cycle, and when the file is available, the warning disappears, and the health monitor displays the details for this module with its status turned Green. |
Snort Reconfiguring Detection |
Telegraph |
This module alerts if a device reconfiguration has failed. This module detects reconfiguration failure for both Snort 2 and Snort 3 instances. |
Snort Statistics |
Telegraph |
This module monitors the Snort statistics for events, flows, and packets. |
Security Services Exchange Connection Status |
Telegraph |
The module alerts if the threat defense cannot connect to the security services exchange cloud after an initial successful connection. Disabled by default. |
Threat Defense HA (Split-brain check) |
Legacy |
This module monitors and alerts on the high availability status of the threat defense and provides a health alert for a split brain scenario. If you have not established threat defense high availability, the HA Status is Not in HA. |
VPN Statistics |
Telegraph |
This module monitors site-to-site and remote access VPN tunnels between threat defense devices. |
XTLS Counters |
Telegraph |
This module monitors XTLS/SSL flows, memory and cache effectiveness. Disabled by default. |
Step 1 |
Determine which health modules you want to monitor as discussed in Health Modules. You can set up specific policies for each kind of appliance, enabling only the appropriate tests for that appliance.
|
||
Step 2 |
Apply a health policy to each appliance where you want to track health status as discussed in Creating Health Policies. |
||
Step 3 |
(Optional.) Configure health monitor alerts as discussed in Creating Health Monitor Alerts. You can set up email, syslog, or SNMP alerts that trigger when the health status level reaches a particular severity level for specific health modules. |
A health policy contains configurable health test criteria for several modules. You can control which health modules run against each of your appliances and configure the specific limits used in the tests run by each module.
When you configure a health policy, you decide whether to enable each health module for that policy. You also select the criteria that control which health status each enabled module reports each time it assesses the health of a process.
You can create one health policy that can be applied to every appliance in your system, customize each health policy to the specific appliance where you plan to apply it, or use the default health policy provided for you.
Note |
When you register an appliance, the management center automatically assigns it the default health policy. To disassociate a health policy from an appliance, you must first associate a different health policy with it. An appliance must have at least one health policy assigned. |
The management center setup process creates and applies an initial health policy, in which most—but not all—available health modules are enabled. The system also applies this initial policy to devices added to the management center.
This initial health policy is based on a default health policy, which you can neither view nor edit, but which you can copy when you create a custom health policy.
When you upgrade the management center, any new health modules are added to all health policies, including the initial health policy, default health policy, and any other custom health policies. Usually, new health modules are added in an enabled state.
Note |
For a new health module to begin monitoring and alerting, reapply health policies after upgrade. |
If you want to customize a health policy to use with your appliances, you can create a new policy. The settings in the policy initially populate with the settings from the health policy you choose as a basis for the new policy. You can edit the policy to specify your preferences, such as enable or disable modules within the policy, change the alerting criteria for each module as needed, and specify the run time intervals.
Step 1 |
Choose System (). |
Step 2 |
Click Create Policy. |
Step 3 |
Enter a name for the policy. |
Step 4 |
Choose the existing policy that you want to use as the basis for the new policy from the Base Policy drop-down list. |
Step 5 |
Enter a description for the policy. |
Step 6 |
Choose Save. |
Apply the health policy on devices as described in Apply a Health Policy.
Edit the policy to specify the module-level policy settings as described in Edit a Health Policy.
When you apply a health policy to an appliance, the health tests for all the modules you enabled in the policy automatically monitor the health of the processes and hardware on the appliance. Health tests then continue to run at the intervals you configured in the policy, collecting health data for the appliance and forwarding that data to the management center.
If you enable a module in a health policy and then apply the policy to an appliance that does not require that health test, the health monitor reports the status for that health module as disabled.
If you apply a policy with all modules disabled to an appliance, it removes all applied health policies from the appliance, so no health policy is applied. However, you must have at least one health policy assigned to an appliance.
When you apply a different policy to an appliance that already has a policy applied, expect some latency in the display of new data based on the newly applied tests.
Step 1 |
Choose System (). |
||
Step 2 |
Click the Deploy health policy () next to the policy you want to apply. |
||
Step 3 |
Choose the appliances where you want to apply the health policy.
|
||
Step 4 |
Click Apply to apply the policy to the appliances you chose. |
Optionally, monitor the task status; see View Task Messages.
Monitoring of the appliance starts when the policy is successfully applied.
You can edit a health policy that you want to modify.
Step 1 |
Choose System (). |
||
Step 2 |
Click Edit () next to the policy you want to modify. |
||
Step 3 |
To edit the policy name and its description, click the Edit () icon provided against the policy name. |
||
Step 4 |
The Health Modules tab displays all the device modules and its attributes. Configure your health modules using the following actions:
For information on the modules, see Health Modules. |
||
Step 5 |
Where appropriate, set the Critical and Warning threshold percentages. |
||
Step 6 |
In the Run Time Intervals tab, enter the relevant values in the fields:
|
||
Step 7 |
To view and modify the devices to which the policy is assigned, do the following:
Alternatively, you can apply the health policy to your appliance as described in Apply a Health Policy Apply the health policy to each appliance where you want to track health status. When you apply the health policy to an appliance, all the modules you enabled in the policy monitor the health of the processes and hardware on the appliance, and forwards that data to the management center. |
||
Step 8 |
Click Save. |
You can delete health policies that you no longer need. However, an appliance must have at least one health policy assigned to it. If you delete a policy that is still applied to an appliance, the policy settings remain in effect until you apply a different policy. In addition, if you delete a health policy that is applied to a device, any health monitoring alerts in effect for the device remain active until you disable the underlying associated alert response.
Tip |
To stop health monitoring for an appliance, create a health policy with all modules disabled and apply it to the appliance. |
Step 1 |
Choose System (). |
Step 2 |
Click Delete () next to the policy that you want to delete, and then click Delete health policy to delete it. |
In the course of normal network maintenance, you disable appliances or make them temporarily unavailable. Because those outages are deliberate, you do not want the health status from those appliances to affect the summary health status on your management center.
You can use the health monitor exclude feature to disable health monitoring status reporting on an appliance or module. For example, if you know that a segment of your network will be unavailable, you can temporarily disable health monitoring for a managed device on that segment to prevent the health status on the management center from displaying a warning or critical state because of the lapsed connection to the device.
When you disable health monitoring status, health events are still generated, but they have a disabled status and do not affect the health status for the health monitor. If you remove the appliance or module from the excluded list, the events that were generated during the exclusion continue to show a status of disabled.
To temporarily disable health events from an appliance, go to the exclusion configuration page and add an appliance to the device exclude list. After the setting takes effect, the system no longer considers the excluded appliance when calculating the overall health status. The Health Monitor Appliance Status Summary lists the appliance as disabled.
You can also disable an individual health module. For example, when you reach the host limit on the management center, you can disable Host Limit status messages. Excluding health modules for individual interfaces is not supported on devices operating in transparent mode.
Note that on the main Health Monitor page you can distinguish between appliances that are excluded if you expand to view the list of appliances with a particular status by clicking the arrow in that status row.
Note |
On management center, Health Monitor exclusion settings are local configuration settings. Therefore, if you exclude a device, then delete it and later re-register it with the management center, the exclusion settings remain persistent. The newly re-registered device remains excluded. |
You can exclude appliances individually or by group, model, or associated health policy.
If you need to set the events and health status for an individual appliance to disabled, you can exclude the appliance. After the exclusion settings take effect, the appliance shows as disabled in the Health Monitor Appliance Module Summary, and health events for the appliance have a status of disabled.
Step 1 |
Choose System (). |
Step 2 |
Click Add Device. |
Step 3 |
In the Device Exclusion dialog box, under Available Devices, click Add () against the device that you want to exclude from health monitoring. |
Step 4 |
Click Exclude. The selected device is displayed in the exclusion main page. |
Step 5 |
To remove the device from the exclusion list, click Delete (). |
Step 6 |
Click Apply. |
You can exclude individual health policy modules on appliances. You may want to do this to prevent events from the module from changing the status for the appliance to warning or critical.
Note |
Excluding health modules for individual interfaces is not supported on devices operating in transparent mode. |
After the exclusion settings take effect, the appliance shows the number of modules being excluded in the device from health monitoring.
Tip |
Make sure that you keep track of individually excluded modules so you can reactivate them when you need them. You may miss necessary warning or critical messages if you accidentally leave a module disabled. |
Step 1 |
Choose System (). |
Step 2 |
Click Edit () next to the appliance you want to modify. |
Step 3 |
In the Exclude Health Modules dialog box, by default, all the modules of the device are excluded from health monitoring. Certain modules are applicable to specific devices only; for more information, see Health Modules. |
Step 4 |
To specify the duration of the exclusion for the device, from the Exclude Period drop-down list, select the duration. |
Step 5 |
To choose modules to be excluded from health monitoring, click the Enable Module Level Exclusion link. The Exclude Health Modules dialog box displays all the modules of the device. The modules that are not applicable for the associated health policies are disabled by default. To exclude a module, perform the following:
|
Step 6 |
If you select an Exclude Period other than Permanent, for your exclusion configuration, you can choose to automatically delete the configuration when it expires. To enable this setting, check the Auto-delete expired configurations check box. |
Step 7 |
Click OK. |
Step 8 |
In the device exclusion main page, click Apply. |
When the exclusion period for a device or modules lapses, you can choose to clear or renew the exclusion.
Step 1 |
Choose System (). The Warning () icon is displayed against the device indicating the expiry of the duration of exclusion of the device or the modules from alerting. |
Step 2 |
To renew the exclusion of the device, click Edit () next to the appliance. In the Exclude Health Modules dialog box, click the Renew link. The exclusion period of the device is extended with the current value. |
Step 3 |
To clear the device from being excluded, click Delete () next to the appliance, click Remove the device from exclusion, and then click Apply. |
Step 4 |
To renew or clear the modules from exclusion, click Edit () next to the appliance. In the Exclude Health Modules dialog box, click the Enable Module Level Exclusion link, and then click the Renew or Clear link against the modules. When you click Renew, the exclusion period is extended on the module with the current value. |
You can set up alerts to notify you through email, through SNMP, or through the syslog when the status changes for the modules in a health policy. You can associate an existing alert response with health event levels to trigger and alert when health events of a particular level occur.
For example, if you are concerned that your appliances may run out of hard disk space, you can automatically send an email to a system administrator when the remaining disk space reaches the warning level. If the hard drive continues to fill, you can send a second email when the hard drive reaches the critical level.
The alerts generated by the health monitor contain the following information:
Severity, which indicates the severity level of the alert.
Module, which specifies the health module whose test results triggered the alert.
Description, which includes the health test results that triggered the alert.
The table below describes these severity levels.
Severity |
Description |
---|---|
Critical |
The health test results met the criteria to trigger a Critical alert status. |
Warning |
The health test results met the criteria to trigger a Warning alert status. |
Normal |
The health test results met the criteria to trigger a Normal alert status. |
Error |
The health test did not run. |
Recovered |
The health test results met the criteria to return to a normal alert status, following a Critical or Warning alert status. |
You must be an Admin user to perform this procedure.
When you create a health monitor alert, you create an association between a severity level, a health module, and an alert response. You can use an existing alert or configure a new one specifically to report on system health. When the severity level occurs for the selected module, the alert triggers.
If you create or update a threshold in a way that duplicates an existing threshold, you are notified of the conflict. When duplicate thresholds exist, the health monitor uses the threshold that generates the fewest alerts and ignores the others. The timeout value for the threshold must be between 5 and 4,294,967,295 minutes.
Configure an alert response that governs the management center's communication with the SNMP, syslog, or email server where you send the health alert; see Secure Firewall Management Center Alert Responses.
Step 1 |
Choose System (). |
Step 2 |
Click Add. |
Step 3 |
In the Add Health Alert dialog box, enter a name for the health alert in the Health Alert Name field. |
Step 4 |
From the Severity drop-down list, choose the severity level you want to use to trigger the alert. |
Step 5 |
From the Alert drop-down list, choose the alert response that you want to trigger when the specified severity level is reached. If you have not yet configured the alert responses, click Alerts to visit the Alerts page and set them. |
Step 6 |
From the Health Modules list, choose the health policy modules for which you want the alert to apply. |
Step 7 |
Optionally, in the Threshold Timeout field, enter the number of minutes that should elapse before each threshold period ends and the threshold count resets. Even if the policy run time interval value is less than the threshold timeout value, the interval between two reported health events from a given module is always greater. For example, if you change the threshold timeout to 8 minutes and the policy run time interval is 5 minutes, there is a 10-minute interval (5 x 2) between reported events. |
Step 8 |
Click Save to save the health alert. |
You must be an Admin user to perform this procedure.
You can edit existing health monitor alerts to change the severity level, health module, or alert response associated with the health monitor alert.
Step 1 |
Choose System (). |
Step 2 |
Click the Edit () icon that is provided against the required health alert that you want to modify. |
Step 3 |
In the Edit Health Alert dialog box, from the Alert drop-down list, select the required alert entry, or click Alerts link to configure a new alert entry. |
Step 4 |
Click Save. |
Step 1 |
Choose System (). |
Step 2 |
Click Delete () next to the health alert you want to delete, and then click Delete health alert to delete it. |
Disable or delete the underlying alert response to ensure that alerting does not continue; see Secure Firewall Management Center Alert Responses.
You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
The health monitor provides the compiled health status for all devices managed by the management center, plus the management center itself. The health monitor is composed of:
The Health Status summary page ― Provides you with an at-a-glance view of the health of the management center and all of the devices that the management center manages. Devices are listed individually, or grouped according to their geolocation, high availability, or cluster status where applicable.
View the health summary of the management center and any device when you hover on the hexagon that represents the device health.
The dot to the left of a device indicates its health:
Green ― No alarms.
Orange ― At least one health warning.
Red ― At least one critical health alarm.
The Monitoring navigation pane ― Allows you to navigate the device hierarchy. You can view health monitors for individual devices from the navigation pane.
Step 1 |
Choose System (). |
Step 2 |
View the status of the management center and its managed devices in the Health Status landing page. |
Step 3 |
Use the Monitoring navigation pane to access device-specific health monitors. When you use the Monitoring navigation pane: |
See Device Health Monitors for information about the compiled health status and metrics for any device managed by the management center.
See Using Management Center Health Monitor for information about the health status of the management center.
To return to the Health Status landing page at any time, click Home.
You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
The management center monitor provides a detailed view of the health status of the management center. The health monitor is composed of:
High Availability (if configured)―The High Availability (HA) panel displays the current HA status, including the status of the Active and Standby units, the last sync time, and overall device health.
Event Rate―The Event Rate panel shows the maximum event rate as a base line as well as the overall event rate received by the management center.
Event Capacity―The Event Capacity panel shows the current consumption by event categories, including the retention time of events, the current vs. maximum event capacity, and a capacity overflow mechanism where you are alerted when events are stored beyond the configured maximum capacity of the management center.
Process Health―The Process Health panel has an at-a-glace view of the critical processes as well as a tab that lets you see state of all processed, including the CPU and memory usage for each process.
CPU―The CPU panel lets you toggle between the average CPU usage (default) and the CPU usage of all cores.
Memory―The Memory panel shows the overall memory usage on the management center.
Interface―The Interface panel shows average input and output rate of all interfaces.
Disk Usage―The Disk Usage panel shows the use of entire disk, and the use of the critical partitions where management center data is stored.
Tip |
Your session normally logs you out after 1 hour of inactivity (or another configured interval). If you plan to passively monitor health status for long periods of time, consider exempting some users from session timeout, or changing the system timeout settings. See Add or Edit an Internal User and Configure Session Timeouts for more information. |
Step 1 |
Choose System (). |
Step 2 |
Use the Monitoring navigation pane to access the management center and device-specific health monitors.
|
Step 3 |
Explore the management center dashboard. The management center dashboard includes a summary view of the HA state of the management center (if configured), as well as at-a-glance views of management center processes and device metrics such as CPU, memory, and disk usage. |
You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
Health module tests run automatically at the policy run time interval you configure when you create a health policy. However, you can also run all health module tests on demand to collect up-to-date health information for the appliance.
Step 1 |
View the health monitor for the appliance. |
||
Step 2 |
Click Run All Modules. The status bar indicates the progress of the tests, then the Health Monitor Appliance page refreshes.
|
You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
Health module tests run automatically at the policy run time interval you configure when you create a health policy. However, you can also run a health module test on demand to collect up-to-date health information for that module.
Step 1 |
View the health monitor for the appliance. |
||
Step 2 |
In the Module Status Summary graph, click the color for the health alert status category you want to view. |
||
Step 3 |
In the Alert Detail row for the alert for which you want to view a list of events, click Run. The status bar indicates the progress of the test, then the Health Monitor Appliance page refreshes.
|
You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
You can graph the results over a period of time of a particular health test for a specific appliance.
Step 1 |
View the health monitor for the appliance. |
||
Step 2 |
In the Module Status Summary graph of the Health Monitor Appliance page, click the color for the health alert status category you want to view. |
||
Step 3 |
In the Alert Detail row for the alert for which you want to view a list of events, click Graph.
|
The device health monitor provides the compiled health status for any device managed by the management center. The device health monitor collects health metrics for Secure Firewall devices in order to predict and respond to system events. The device health monitor is comprised of the following components:
System Details ― Displays information about the managed device, including the installed Secure Firewall version and other deployment details.
Troubleshooting & Links ― Provides convenient links to frequently used troubleshooting topics and procedures.
Health alerts ― A health alert monitor provides an at-a-glance view of the health of the device.
Time range ― An adjustable time window to constrain the information that appears in the various device metrics windows.
Device metrics ― An array of key firewall device health metrics categorized across predefined dashboards, including:
CPU ― CPU utilization, including the CPU usage by process and by physical cores.
Memory ― Device memory utilization, including data plane and Snort memory usage.
Interfaces ― Interface status and aggregate traffic statistics.
Connections ― Connection statistics (such as elephant flows, active connections, peak connections, and so on) and NAT translation counts.
Snort ― Statistics related to the Snort process.
Disk Usage ― Device disk usage, including the disk size and disk utilization per partition.
Critical Processes ― Statistics related to managed processes, including process restarts and other select health monitors such as CPU and memory utilization.
See Cisco Secure Firewall Threat Defense Health Metrics for a comprehensive list of the supported device metrics.
You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
The System Details section provides a general system information for a selected device. You can also launch troubleshooting tasks for that device.
Step 1 |
Choose System (). Use the Monitoring navigation pane to access device-specific health monitors. |
Step 2 |
In the device list, click Expand () and Collapse () to expand and collapse the list of managed devices. |
Step 3 |
Click on a device to view a device-specific health monitor. |
Step 4 |
Click the link for View System & Troubleshoot Details … This panel is collapsed by default. Clicking on the link expands the collapsed section to see System Details and Troubleshooting & Links for the device. The system details include:
|
Step 5 |
You have the following troubleshoot choices:
|
You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
The device health monitor provides a detailed view of the health status of a firewall device. The device health monitor compiles device metrics and provides health status and trends of the device in an array of dashboards.
Step 1 |
Choose System (). Use the Monitoring navigation pane to access device-specific health monitors. |
Step 2 |
In the device list, click Expand () and Collapse () to expand and collapse the list of managed devices. |
Step 3 |
View the Health Alerts for the device in the alert notification at the top of page, directly to the right of the device name. Hover your pointer over the Health Alerts to view the health summary of the device. The popup window shows a truncated summary of the top five health alerts. Click on the popup to open a detailed view of the health alert summary. |
Step 4 |
You can configure the time range from the drop-down in the upper-right corner. The time range can reflect a period as short as the last hour (the default) or as long as two weeks. Select Custom from the drop-down to configure a custom start and end date. Click the refresh icon to set auto refresh to 5 minutes or to toggle off auto refresh. |
Step 5 |
Click the Show Deployment Info () icon for a deployment overlay on the trend graph, with respect to the selected time range. The Show Deployment Info () icon indicates the number of deployments during the selected time-range. A vertical band indicates the deployment start and end time. In the case of multiple deployments, multiple bands/lines can appear. Click the icon on top of the dotted line to view the deployment details. |
Step 6 |
The device monitor reports health and performance metrics in several predefined dashboards by default. The metrics dashboards include:
You can navigate through the various metrics dashboards by clicking on the labels. See Cisco Secure Firewall Threat Defense Health Metrics for a comprehensive list of the supported device metrics. |
Step 7 |
Click the Add Dashboard (+) to create a custom correlation dashboard by building your own variable set from the available metric groups; see Correlating Device Metrics. |
The device health monitor includes an array of key threat defense device metrics that serve to predict and respond to system events. The health of any threat defense device can be determined by these reported metrics.
The device monitor reports these metrics in several predefined dashboards by default. These dashboards include:
Overview ― Highlights key metrics from the other predefined dashboards, including CPU, memory, interfaces, connection statistics; plus disk usage and critical process information.
CPU ― CPU utilization, including the CPU usage by process and by physical cores.
Memory ― Device memory utilization, including data plane and Snort memory usage.
Interfaces ― Interface status and aggregate traffic statistics.
Connections ― Connection statistics (such as elephant flows, active connections, peak connections, and so on) and NAT translation counts.
Snort ― Statistics related to the Snort process.
ASP Drops ― Statistics related to the Accelerated Security Path (ASP) performance and behavior.
You can add custom dashboards to correlate metrics that are interrelated. Select from predefined correlation groups, such as CPU and Snort; or create a custom correlation dashboard by building your own variable set from the available metric groups. See Cisco Secure Firewall Threat Defense Health Metrics for a comprehensive list of the supported device metrics.
To view and correlate the time series data (device metrics) in the health monitor dashboard, enable REST API (
).You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
Note |
Correlating device metrics is available only for threat defense 6.7 and later versions. Hence, for threat defense versions earlier than 6.7, the health monitor dashboard does not display these metrics even if you enable REST API. |
Step 1 |
Choose System (). Use the Monitoring navigation pane to access device-specific health monitors. |
Step 2 |
In the Devices list, click Expand () and Collapse () to expand and collapse the list of managed devices. |
Step 3 |
Choose the device for which you want to modify the dashboard. |
Step 4 |
Click the Add Dashboard (+) icon in the upper right corner of the device monitor to add a new dashboard. |
Step 5 |
From the Select Correlation Group drop-down, choose a predefined correlation group or to create a custom group. |
Step 6 |
To create a dashboard from a predefined correlation group, choose the group and click Add. |
Step 7 |
To create a custom correlation dashboard:
See Cisco Secure Firewall Threat Defense Health Metrics for a comprehensive list of the supported device metrics. |
Step 8 |
Click Add Metrics to add and select metrics from another group. |
Step 9 |
To remove an individual metric, click the x icon on the right side of the item. Click the delete icon to remove the entire group. |
Step 10 |
Click Add to add the dashboard to the health monitor. |
Step 11 |
You can Edit or Delete custom correlation dashboards. |
Available status categories are listed by severity in the table below.
Status Level |
Status Icon |
Status Color in Pie Chart |
Description |
---|---|---|---|
Error |
Error () |
Black |
Indicates that at least one health monitoring module has failed on the appliance and has not been successfully re-run since the failure occurred. Contact your technical support representative to obtain an update to the health monitoring module. |
Critical |
Critical () |
Red |
Indicates that the critical limits have been exceeded for at least one health module on the appliance and the problem has not been corrected. |
Warning |
Warning () |
Yellow |
Indicates that warning limits have been exceeded for at least one health module on the appliance and the problem has not been corrected. This status also indicates a transitionary state, where, the required data is temporarily unavailable or could not be processed because of changes in the device configuration. Depending on the monitoring cycle, this transitionary state is auto-corrected. |
Normal |
Normal () |
Green |
Indicates that all health modules on the appliance are running within the limits configured in the health policy applied to the appliance. |
Recovered |
Recovered () |
Green |
Indicates that all health modules on the appliance are running within the limits configured in the health policy applied to the appliance, including modules that were in a Critical or Warning state. |
Disabled |
Disabled () |
Blue |
Indicates that an appliance is disabled or excluded, that the appliance does not have a health policy applied to it, or that the appliance is currently unreachable. |
The Health Event View page allows you to view health events logged by the health monitor on the management center logs health events. The fully customizable event views allow you to quickly and easily analyze the health status events gathered by the health monitor. You can search event data to easily access other information that may be related to the events you are investigating. If you understand what conditions each health module tests for, you can more effectively configure alerting for health events.
You can perform many of the standard event view functions on the health event view pages.
You must be an Admin, Maintenance, or Security Analyst user to perform this procedure.
The Table View of Health Events page provides a list of all health events on the specified appliance.
When you access health events from the Health Monitor page on your management center, you retrieve all health events for all managed appliances.
Tip |
You can bookmark this view to allow you to return to the page in the health events workflow containing the Health Events table of events. The bookmarked view retrieves events within the time range you are currently viewing, but you can then modify the time range to update the table with more recent information if needed. |
Choose System ().
|
Step 1 |
View the health monitor for the appliance; see Viewing the Device Health Monitor. |
Step 2 |
In the Module Status Summary graph, click the color for the event status category you want to view. The Alert Detail list toggles the display to show or hide events. |
Step 3 |
In the Alert Detail row for the alert for which you want to view a list of events, click Events. The Health Events page appears, containing results for a query with the name of the appliance and the name of the specified health alert module as constraints. If no events appear, you may need to adjust the time range. |
Step 4 |
If you want to view all health events for the specified appliance, expand Search Constraints, and click the Module Name constraint to remove it. |
You can view and modify the Health Events Table.
Step 1 |
Choose System (). |
Step 2 |
You have the following choices:
|
The Health Monitor modules you choose to enable in your health policy run various tests to determine appliance health status. When the health status meets criteria that you specify, a health event is generated.
The table below describes the fields that can be viewed and searched in the health events table.
Field |
Description |
---|---|
Module Name |
Specify the name of the module which generated the health events
you want to view. For example, to view events that measure CPU performance,
type
|
Test Name (Search only) |
The name of the health module that generated the event. |
Time (Search only) |
The timestamp for the health event. |
Description |
The description of the health module that generated the event.
For example, health events generated when a process was unable to execute are
labeled
|
Value |
The value (number of units) of the result obtained by the health test that generated the event. For example, if the management center generates a health event whenever a device it is monitoring is using 80 percent or more of its CPU resources, the value could be a number from 80 to 100. |
Units |
The units descriptor for the result. You can use the asterisk (*) to create wildcard searches. For example, if the management center generates a health event when a device it is monitoring is using 80 percent or more of its CPU resources, the units descriptor is a percentage sign (%). |
Status |
The status (Critical, Yellow, Green, or Disabled) reported for the appliance. |
Device |
The appliance where the health event was reported. |
Feature |
Minimum Management Center |
Minimum Threat Defense |
Details |
||
---|---|---|---|---|---|
Health alerts for NTP server sync issues. |
Any |
Introduced the Time Sever Status module in the Secure Firewall Management Center Health Policy. When enabled, this module monitors the configuration of the NTP servers and alerts when the NTP server is unavailable or if the NTP server configuration is invalid. New/modified screens: System () . |
|||
Health alert for device configuration history files size |
7.2.6 |
Any |
The Disk Usage module sends health alert when the size of device configuration history files on the management center exceeds the allowed limit. This alert is enabled by default. Health alert for exceeding the configuration versions size is not supported on the Secure Firewall Management Center versions 7.3.0 and 7.4.0. |
||
Health monitor usability enhancements. |
7.1 |
Any |
Following UI page were improved for better usability and presentation of data:
New/modified screens: .
|
||
Elephant flow detection. |
7.1 |
Any |
The health monitor includes the following enhancements:
The Elephant Flow Detection feature is not supported on the Cisco Firepower 2100 series. |
||
Discontinued high unmanaged disk usage alerts. |
7.0.6 |
Any |
The Disk Usage health module no longer alerts with high unmanaged disk usage. After upgrade, you may continue to see these alerts until you either deploy health policies to managed devices (stops the display of alerts) or upgrade the devices (stops the sending of alerts).
|
||
New health modules. |
7.0 |
Any |
We added the following health modules:
|
||
Health monitor enhancements. |
7.0 |
Any |
The health monitor adds the following enhancements:
|
||
New health modules. |
6.7 |
Any |
The CPU Usage module is no longer used. Instead, see the following modules for CPU usage:
The following modules were added to track statistics:
The following modules were added to track memory usage:
|
||
Health monitor enhancements. |
6.7 |
Any |
The health monitor adds the following enhancements:
|
||
Functionality moved to the Threat Data Updates on Devices module. |
6.7 |
Any |
The Local Malware Analysis module is no longer used. Instead, see the Threat Data Updates on Devices module for this information. Some information formerly provided by the Security Intelligence module and the URL Filtering Module is now provided by the Threat Data Updates on Devices module. |
||
New health module: Configuration Memory Allocation. |
7.0 6.6.3 |
Any |
Version 6.6.3 improves device memory management and introduces a new health module: Configuration Memory Allocation. This module alerts when the size of your deployed configurations puts a device at risk of running out of memory. The alert shows you how much memory your configurations require, and by how much this exceeds the available memory. If this happens, re-evaluate your configurations. Most often you can reduce the number or complexity of access control rules or intrusion policies. |
||
URL Filtering Monitor improvements. |
6.5 |
Any |
The URL Filtering Monitor module now alerts if the management center fails to register to the Cisco cloud. |
||
URL Filtering Monitor improvements. |
6.4 |
Any |
You can now configure time thresholds for URL Filtering Monitor alerts. |
||
New health module: Threat Data Updates on Devices. |
6.3 |
Any |
A new module, Threat Data Updates on Devices, was added. This module alerts you if certain intelligence data and configurations that devices use to detect threats has not been updated on the devices within the time period you specify. |