System Health Check
Monitoring systems in a network proactively helps prevent potential issues and take preventive actions. This section illustrates how you can monitor the system health using the health check service. This service helps to analyze the system health by monitoring, tracking and analyzing metrics that are critical for functioning of the router.
The system health can be gauged with the values reported by these metrics when the configured threshold values exceed or are nearing the threshold value.
This table describes the significant fields shown in the display.
Metric |
Parameter Tracked |
Considered Unhealthy When |
---|---|---|
Critical System Resources |
CPU, free memory, file system, shared memory |
The respective metric has exceeded the threshold |
Infrastructure Services |
Field Programmable Device (FPD), fabric health, platform, redundancy |
Any component of the service is down or in an error state |
Counters |
Interface-counters, fabric-statistics, asic-errors |
Any specific counter exhibits a consistent increase in drop/error count over the last n runs (n is configurable through CLI, default is 10) |
By default, metrics for system resources are configured with preset threshold values. You can customize the metrics to be monitored by disabling or enabling metrics of interest based on your requirement.
Each metric is tracked and compared with that of the configured threshold, and the state of the resource is classified accordingly.
The system resources exhibit one of these states:
-
Normal: The resource usage is less than the threshold value.
-
Minor: The resource usage is more than the minor threshold, but less than the severe threshold value.
-
Severe: The resource usage is more than the severe threshold, but less than the critical threshold value.
-
Critical: The resource usage is more than the critical threshold value.
The infrastructure services show one of these states:
-
Normal: The resource operation is as expected.
-
Warning: The resource needs attention. For example, a warning is displayed when the FPD needs an upgrade.
The health check service is packaged as an optional RPM. This is not part of the base package and you must explicitly install this RPM.
You can configure the metrics and their values using CLI. In addition to the CLI, the service supports NETCONF client to apply
configuration (Cisco-IOS-XR-healthcheck-cfg.yang
) and retrieve operational data (Cisco-IOS-XR-healthcheck-oper.yang
) using YANG data models. It also supports subscribing to metrics and their reports to stream telemetry data. For more information
about streaming telemetry data, see Telemetry Configuration Guide for Cisco 8000 Series Routers.