Introduction
This document describes how a task's resource usage works in StarOS and provides a list of logs that need to be collected to troubleshoot CPU/Memory/File high usage events. On StarOS, the Resource Management Subsystem (resctrl / resmgr) assigns a set of resource limits for each task in the system. It monitors each task's resource usage in order to ensure it stays within the limit. When a task has exceeded its limits, the Syslog or Simple Network Management Protocol (SNMP) traps are generated to notify the network operations.
Resource Monitoring Mechanism
There are a lot of tasks run on StarOS, for example sessmgr/aaamgr/vpnmgr and so on. Each task is set a limit for CPU/Memory/File usage and the limits are monitored by resource management. The limits can be different by task type (sessmgr and aaamgr have different limits), StarOS version, and hardware type. Also, the limits are defined by the system and are not configurable by users.
The description for each task on StarOS can be found in the StarOS Tasks chapter of the System Administration Guide.
The basic resource usage information can be found in the output of the show task resources
CLI command.
Field |
Description |
cputime used |
CPU usage of task |
cputime allc |
Allocated CPU usage limit for task |
memory used |
Memory usage of task |
memory alloc |
Allocated memory usage limit for task |
files used |
Files usage of task |
files allc |
Allocated files usage of task |
status |
Status of task: good / warn / over |
It is important to understand that the purpose is to keep watch on resources and does not limit the task functionality. The task must be able to work even after it consumes more CPU/Memory/Files than the limit. Syslog and SNMP traps are generated when the limit is crossed, but it does not always indicate an issue.
Suspected Cause
In many cases, a temporary usage spike is not a problem. But if it is persistent, for example, a task's CPU usage stays at 100% or memory usage continues to grow and never be reduced, such cases need to be investigated.
The typical causes for a temporary spike are:
- CLI command which generates huge output (CLI task)
- Amount of log information held in the system (evlogd task)
The cases that need to be investigated are:
- High CPU usage caused by the internal infinite loop (CPU usage stays at 100%)
- The constant increase of memory usage by a memory leak or fragmentation
The examples of the SNMP traps in sessmgr, npudrv and CLI facilities are shown here:
Mon Aug 26 11:32:19 2013 Internal trap notification 1221 (MemoryOver) facility sessmgr instance 16 card 1 cpu 0 allocated 204800 used 220392
Mon Aug 26 11:32:29 2013 Internal trap notification 1222 (MemoryOverClear) facility sessmgr instance 16 card 1 cpu 0 allocated 1249280 used 219608
Fri Dec 20 13:52:20 2013 Internal trap notification 1217 (MemoryWarn) facility npudrv instance 401 card 5 cpu 0 allocated 112640 used 119588
Fri Dec 20 14:07:26 2013 Internal trap notification 1218 (MemoryWarnClear) facility cli instance 5011763 card 5 cpu 0 allocated 56320 used 46856
Wed Dec 25 12:24:16 2013 Internal trap notification 1220 (CPUOverClear) facility cli instance 5010294 card 5 cpu 0 allocated 600 used 272
Wed Dec 25 12:24:16 2013 Internal trap notification 1216 (CPUWarnClear) facility cli instance 5010294 card 5 cpu 0 allocated 600 used 272
Wed Dec 25 17:04:56 2013 Internal trap notification 1215 (CPUWarn) facility cli instance 5010317 card 5 cpu 0 allocated 600 used 595
Wed Dec 25 17:05:36 2013 Internal trap notification 1216 (CPUWarnClear) facility cli instance 5010317 card 5 cpu 0 allocated 600 used 220
CPU Usage
When the CPU task usage is close or over the limit, the CPUWarn and CPUOver SNMP traps are generated along with the Syslog warning.
SNMP Traps
Internal trap notification 1215 (CPUWarn) facility sct instance 0 card 8 cpu 0 allocated 500 used 451
Internal trap notification 1219 (CPUOver) facility cli instance 5010046 card 5 cpu 0 allocated 600 used 609
In the CPUOver example, the instance number 5010046 consumes 60.9% CPU usage while the limit is 60%.
Syslog
[resmgr 14502 warning] [2/0/2352 <rmmgr:20> _resource_cpu.c:2876] [software internal system] The task ipsecmgr-202 is over it's cputime limit. Allocated 50.0%, Using 51.8%
Note: This Syslog is a warning level and is not generated with the default logging setting. If this needs to be generated, the logging setting for resmgr must be configured as a warning.
Memory Usage
When memory task usage is close or over the limit, the MemoryWarn and MemoryOver SNMP traps are generated along with the Syslog warning.
SNMP Traps
Internal trap notification 1217 (MemoryWarn) facility cli instance 5005588 card 5 cpu 0 allocated 66560 used 70212
Internal trap notification 1221 (MemoryOver) facility cli instance 5010046 card 5 cpu 0 allocated 66560 used 89940
In the MemoryOver example, the instance number 5010046 consumes 89940 memory while the limit is 66560.
Syslog
[resmgr 14500 warning] [8/0/4054 <rmmgr:80> _resource_cpu.c:3622] [software internal system syslog] The task bulkstat-0 is over its memory limit. Allocated 46080K, Using 48120K
Note: This Syslog is a warning level and is not generated with the default logging setting. If this needs to be generated, the logging setting for resmgr must be configured as a warning.
Files Usage
The files
indicates the number of open files, or the file descriptor task uses. There is no SNMP trap for the file's usage, but a Syslog is generated when the limit is crossed.
2013-May-28+14:16:18.746 [resmgr 14517 warning] [8/0/4440 <rmmgr:80>
_resource_cpu.c:3558] [software internal system syslog] The task cli-8031369 is over its open files limit. Allocated 2000, Using 2499
Information Needed to Troubleshoot Issues
This section describes what information needs to be collected before you open a new Technical Assistance Center (TAC) Service Request when further investigation is needed. The log that needs to be collected is different based on the type of usage.
Note: In addition to the list of commands, the output of the command show support detail
is always required.
CPU Usage
Enter these commands in the StarOS CLI and capture the output:
show task resources
show task resource max
show snmp trap history
show logs
show profile facility <task name> instance <instance number> depth 4
Note: The show profile
command is a hidden-mode CLI command.
Memory Usage
Enter these commands in the StarOS CLI and capture the output:
show task resources
show task resource max
show snmp trap history
show logs
Collect heap and system heap commands multiple times at regular intervals, for example, every 15 minutes and four outputs.
show messenger proclet facility <task name> instance <instance number> heap
show messenger proclet facility <task name> instance <instance number> system heap
Note: The show messenger proclet
command a hidden-mode CLI command.
Files Usage
Enter these commands in the StarOS CLI and capture the output:
show task resources
show task resource max
show snmp trap history
show logs