When PAM detects a process crash, traceback, potential memory leak, CPU hog, a full file system, , it automatically collects
logs and saves these logs (along with the core file in applicable cases) as a .tgz file in harddisk:/cisco_support/ or in /misc/disk1/cisco_support/ directory. PAM also generates a syslog message with severity
level as warning, mentioning the respective issue.
The format of the .tgz file is:
PAM-<platform>-<PAM
event>-<node-name>-<PAM
process>-<YYYYMMDD>-<checksum>.tgz .For example,
PAM-asr9k-crash-xr_0_RP0_CPU0-ipv4_rib-2016Aug16-210405.tgz
is the file collected when PAM detects a process crash.
Because PAM assumes
that core files are saved to the default archive folder (harddisk:/ or
/misc/disk1/), you must not modify the location of core archive (by configuring
exception filepath) or remove the core files generated after PAM detects an
event. Else, PAM does not detect the process crash. Also, once reported, the
PAM does not report the same issue for the same process in the same node again.
For the list of
commands used while collecting logs, refer
Files Collected by PAM Tool.
The sections below
describe the main PAM events:
Crash
Monitoring
The PAM monitors
process crash for all nodes, in real time. This is a sample syslog generated
when the PAM detects a process crash:
RP/0/RP0/CPU0:Aug 16 21:04:06.442 : logger[69324]: %OS-SYSLOG-4-LOG_WARNING : PAM detected crash for ipv4_rib on 0_RP0_CPU0.
All necessary files for debug have been collected and saved at
0/RP0/CPU0 : harddisk:/cisco_support/PAM-asr9k-crash-xr_0_RP0_CPU0-ipv4_rib-2016Aug16-210405.tgz
Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.)
Traceback
Monitoring
The PAM monitors
tracebacks for all nodes, in real time. This is a sample syslog generated when
the PAM detects a traceback:
RP/0/RP0/CPU0:Aug 16 21:42:42.320 : logger[66139]: %OS-SYSLOG-4-LOG_WARNING : PAM detected traceback for ipv4_rib on 0_RP0_CPU0.
All necessary files for debug have been collected and saved at
0/RP0/CPU0 : harddisk:/cisco_support/PAM-asr9k-traceback-xr_0_RP0_CPU0-ipv4_rib-2016Aug16-214242.tgz
Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.)
Memory Usage
Monitoring
The PAM monitors
the process memory usage for all nodes. The PAM detects potential memory leaks
by monitoring the memory usage trend and by applying a proprietary algorithm to
the collected data. By default, it collects top output on all nodes
periodically at an interval of 30 minutes.
This is a sample
syslog generated when the PAM detects a potential memory leak:
RP/0/RP0/CPU0:Aug 17 05:13:32.684 : logger[67772]: %OS-SYSLOG-4-LOG_WARNING : PAM detected significant memory increase
(from 13.00MB at 2016/Aug/16/20:42:41 to 28.00MB at 2016/Aug/17/04:12:55) for pam_memory_leaker on 0_RP0_CPU0.
All necessary files for debug have been collected and saved at
0/RP0/CPU0 : harddisk:/cisco_support/PAM-asr9k-memory_leak-xr_0_RP0_CPU0-pam_memory_leaker-2016Aug17-051332.tgz
(Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.)
CPU
Monitoring
The PAM monitors
CPU usage on all nodes periodically at an interval of 30 minutes. The PAM
reports a CPU hog in either of these scenarios:
-
When a process
constantly consumes high CPU (that is, more than the threshold of 90
percentage)
-
When high CPU
usage lasts for more than 60 minutes
This is a sample
syslog generated when the PAM detects a CPU hog:
RP/0/RP0/CPU0:Aug 16 00:56:00.819 : logger[68245]: %OS-SYSLOG-4-LOG_WARNING : PAM detected CPU hog for cpu_hogger on 0_RP0_CPU0.
All necessary files for debug have been collected and saved at 0/RP0/CPU0 :
harddisk:/cisco_support/PAM-asr9k-cpu_hog-xr_0_RP0_CPU0-cpu_hogger-2016Aug16-005600.tgz
(Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.)
RP/0/RP0/CPU0:Jun 21 15:33:54.517 : logger[69042]: %OS-SYSLOG-1-LOG_ALERT : PAM detected ifmgr is hogging CPU on 0_RP0_CPU0!
File System
Monitoring
The PAM monitors
disk usage on all nodes periodically at an interval of 30 minutes. This is a
sample syslog generated when the PAM detects that a file system is full:
RP/0/RP0/CPU0:Jun 20 13:59:04.986 : logger[66125]: %OS-SYSLOG-4-LOG_WARNING : PAM detected /misc/config is full on 0_1_CPU0
(please clean up to avoid any fault caused by this). All necessary files for debug have been collected and saved at
0/RP0/CPU0 : harddisk:/cisco_support/PAM-asr9k-disk_usage-xr_0_1_CPU0-2016Jun20-135904.tgz
(Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.)