The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes how to troubleshoot CPU usage concerns, primarily due to interrupts, on the new Cisco IOS® XE platform.
It is important to understand howCisco IOS® XE is built. WithCisco IOS® XE, Cisco has moved to a Linux kernel and all of the subsystems have been broken down into processes. All of the subsystems that were inside Cisco IOS before such as the modules drivers, High Availability (HA), and so on now run as software processes within the Linux Operating System (OS). Cisco IOS itself runs as a daemon within the Linux OS (IOSd).Cisco IOS® XE retains not only the same look and feel of the classic Cisco IOS®, but also its operation, support, and management.
Additionally, the document introduces several new commands on this platform that are integral in order to troubleshoot CPU usage problems.
Here are some useful definitions:
The troubleshoot and verification process in this section can be broadly used for high CPU usage due to interrupts.
The show process cpu command naturally displays how the CPU currently looks. Notice that the Cisco Catalyst 3850 Series Switch uses four cores, and you see the CPU usage listed for all four cores:
3850-2#show processes cpu sorted | exclude 0.0
Core 0: CPU utilization for five seconds: 53%; one minute: 39%; five minutes: 41%
Core 1: CPU utilization for five seconds: 43%; one minute: 57%; five minutes: 54%
Core 2: CPU utilization for five seconds: 95%; one minute: 60%; five minutes: 58%
Core 3: CPU utilization for five seconds: 32%; one minute: 31%; five minutes: 29%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
8525 472560 2345554 7525 31.37 30.84 30.83 0 iosd
5661 2157452 9234031 698 13.17 12.56 12.54 1088 fed
6206 19630 74895 262 1.83 0.43 0.10 0 eicored
6197 725760 11967089 60 1.41 1.38 1.47 0 pdsd
From the output, it is clear that the Cisco IOS® daemon consumes a major portion of the CPU along with the FED, which is the heart of this box. When CPU usage is high due to interrupts, you see that Cisco IOSd and FED use a major portion of the CPU, and these subprocesses (or a subset of these) use the CPU:
You can zoom into any of these processes with the show process cpu detailed <process> command. Since Cisco IOSd is responsible for the majority of the CPU usage, here is a closer look into that.
3850-2#show processes cpu detailed process iosd sorted | ex 0.0
Core 0: CPU utilization for five seconds: 36%; one minute: 39%; five minutes: 40%
Core 1: CPU utilization for five seconds: 73%; one minute: 52%; five minutes: 53%
Core 2: CPU utilization for five seconds: 22%; one minute: 56%; five minutes: 58%
Core 3: CPU utilization for five seconds: 46%; one minute: 40%; five minutes: 31%
PID T C TID Runtime(ms)Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
8525 L 556160 2356540 7526 30.42 30.77 30.83 0 iosd
8525 L 1 8525 712558 284117 0 23.14 23.33 23.38 0 iosd
59 I 1115452 4168181 0 42.22 39.55 39.33 0 ARP Snoop
198 I 3442960 4168186 0 25.33 24.22 24.77 0 IP Host Track Proce
30 I 3802130 4168183 0 24.66 27.88 27.66 0 ARP Input
283 I 574800 3225649 0 4.33 4.00 4.11 0 DAI Packet Process
3850-2#show processes cpu detailed process fed sorted | ex 0.0
Core 0: CPU utilization for five seconds: 45%; one minute: 44%; five minutes: 44%
Core 1: CPU utilization for five seconds: 38%; one minute: 44%; five minutes: 45%
Core 2: CPU utilization for five seconds: 42%; one minute: 41%; five minutes: 40%
Core 3: CPU utilization for five seconds: 32%; one minute: 30%; five minutes: 31%
PID T C TID Runtime(ms)Invoked uSecs 5Sec 1Min 5Min TTY Process
(%) (%) (%)
5638 L 612840 1143306 536 13.22 12.90 12.93 1088 fed
5638 L 3 8998 396500 602433 0 9.87 9.63 9.61 0 PunjectTx
5638 L 3 8997 159890 66051 0 2.70 2.70 2.74 0 PunjectRx
The output (Cisco IOSd CPU output) shows that ARP Snoop, IP Host Track Process, and ARP Input are high. This is commonly seen when the CPU is interrupted due to ARP packets.
The Cisco Catalyst 3850 Series Switch has a number of queues that cater to different types of packets (the FED maintains 32 RX CPU queues, which are queues that go directly to the CPU). It is important to monitor these queues in order to discover which packets are punted to the CPU and which are processed by the Cisco IOSd. These queues are per ASIC.
Note: There are two ASICs: 0 and 1. Ports 1 through 24 belong to ASIC 0.
In order to look at the queues, enter the show platform punt statistics port-asic <port-asic>cpuq <queue> direction <rx|tx>
command.
In the show platform punt statistics port-asic 0 cpuq -1 direction rx command, the -1 argument lists all of the queues. Therefore, this command lists all receive queues for Port-ASIC 0.
Now, you must identify which queue pushes a large number of packets at a high rate. In this example, an examination of the queues revealed this culprit:
<snip>
RX (ASIC2CPU) Stats (asic 0 qn 16 lqn 16):
RXQ 16: CPU_Q_PROTO_SNOOPING
----------------------------------------
Packets received from ASIC : 79099152
Send to IOSd total attempts : 79099152
Send to IOSd failed count : 1240331
RX suspend count : 1240331
RX unsuspend count : 1240330
RX unsuspend send count : 1240330
RX unsuspend send failed count : 0
RX dropped count : 0
RX conversion failure dropped : 0
RX pkt_hdr allocation failure : 0
RX INTACK count : 0
RX packets dq'd after intack : 0
Active RxQ event : 9906280
RX spurious interrupt : 0
<snip>
The queue number is 16 and the queue name is CPU_Q_PROTO_SNOOPING.
Another way to discover the culprit queue is to enter the show platform punt client command.
3850-2#show platform punt client
tag buffer jumbo fallback packets received failures
alloc free bytes conv buf
27 0/1024/2048 0/5 0/5 0 0 0 0 0
65536 0/1024/1600 0/0 0/512 0 0 0 0 0
65537 0/ 512/1600 0/0 0/512 1530 1530 244061 0 0
65538 0/ 5/5 0/0 0/5 0 0 0 0 0
65539 0/2048/1600 0/16 0/512 0 0 0 0 0
65540 0/ 128/1600 0/8 0/0 0 0 0 0 0
65541 0/ 128/1600 0/16 0/32 0 0 0 0 0
65542 0/ 768/1600 0/4 0/0 0 0 0 0 0
65544 0/ 96/1600 0/4 0/0 0 0 0 0 0
65545 0/ 96/1600 0/8 0/32 0 0 0 0 0
65546 0/ 512/1600 0/32 0/512 0 0 0 0 0
65547 0/ 96/1600 0/8 0/32 0 0 0 0 0
65548 0/ 512/1600 0/32 0/256 0 0 0 0 0
65551 0/ 512/1600 0/0 0/256 0 0 0 0 0
65556 0/ 16/1600 0/4 0/0 0 0 0 0 0
65557 0/ 16/1600 0/4 0/0 0 0 0 0 0
65558 0/ 16/1600 0/4 0/0 0 0 0 0 0
65559 0/ 16/1600 0/4 0/0 0 0 0 0 0
65560 0/ 16/1600 0/4 0/0 0 0 0 0 0
s65561 421/ 512/1600 0/0 0/128 79565859 131644697 478984244 0 37467
65563 0/ 512/1600 0/16 0/256 0 0 0 0 0
65564 0/ 512/1600 0/16 0/256 0 0 0 0 0
65565 0/ 512/1600 0/16 0/256 0 0 0 0 0
65566 0/ 512/1600 0/16 0/256 0 0 0 0 0
65581 0/ 1/1 0/0 0/0 0 0 0 0 0
131071 0/ 96/1600 0/4 0/0 0 0 0 0 0
fallback pool: 98/1500/1600
jumbo pool: 0/128/9300
Determine the tag for which the most packets have been allocated. In this example, it is 65561.
Then, enter this command:
3850-2#show pds tag all | in Active|Tags|65561
Active Client Client
Tags Handle Name TDA SDA FDA TBufD TBytD
65561 7296672 Punt Rx Proto Snoop 79821397 79821397 0 79821397 494316524
This output shows that the queue is Rx Proto Snoop.
The s before the 65561 in the output of the show platform punt client command means that the FED handle is suspended and overwhelmed by the number of incoming packets. If the s does not vanish, it means the queue is stuck permanently.
In the results of the show pds tag all command, notice a handle, 7296672, is reported next to the Punt Rx Proto Snoop.
Use this handle in the show pds client <handle> packet last sink command. Notice that you must enable debug pds pktbuf-last before you use the command. Otherwise, you encounter this error:
3850-2#show pds client 7296672 packet last sink
% switch-2:pdsd:This command works in debug mode only. Enable debug using
"debug pds pktbuf-last" command
With the debug enabled, you see this output:
3850-2#show pds client 7296672 packet last sink
Dumping Packet(54528) # 0 of Length 60
-----------------------------------------
Meta-data
0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0010 00 00 16 1d 00 00 00 00 00 00 00 00 55 5a 57 f0 ............UZW.
0020 00 00 00 00 fd 01 10 df 00 5b 70 00 00 10 43 00 .........[p...C.
0030 00 10 43 00 00 41 fd 00 00 41 fd 00 00 00 00 00 ..C..A...A......
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050 00 00 00 3c 00 00 00 00 00 01 00 19 00 00 00 00 ...<............
0060 01 01 b6 80 00 00 00 4f 00 00 00 00 00 00 00 00 .......O........
0070 01 04 d8 80 00 00 00 33 00 00 00 00 00 00 00 00 .......3........
0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00a0 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00 ................
Data
0000 ff ff ff ff ff ff aa bb cc dd 00 00 08 06 00 01 ................
0010 08 00 06 04 00 01 aa bb cc dd 00 00 c0 a8 01 0a ................
0020 ff ff ff ff ff ff c0 a8 01 14 00 01 02 03 04 05 ................
0030 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 ............
This command dumps the last packet received by the sink, which is Cisco IOSd in this example. This shows that it dumps the header and it can be decoded with Terminal-based Wireshark (TShark). The Meta-data is for internal use by the system, but the Data output provides actual packet information. The Meta-data, however, remains extremely useful.
Notice the line that starts with 0070. Use the first 16 bits after that as shown here:
3850-2#show platform port-asic ifm iif-id 0x0104d88000000033
Interface Table
Interface IIF-ID : 0x0104d88000000033
Interface Name : Gi2/0/20
Interface Block Pointer : 0x514d2f70
Interface State : READY
Interface Stauts : IFM-ADD-RCVD, FFM-ADD-RCVD
Interface Ref-Cnt : 6
Interface Epoch : 0
Interface Type : ETHER
Port Type : SWITCH PORT
Port Location : LOCAL
Slot : 2
Unit : 20
Slot Unit : 20
Acitve : Y
SNMP IF Index : 22
GPN : 84
EC Channel : 0
EC Index : 0
ASIC : 0
ASIC Port : 14
Port LE Handle : 0x514cd990
Non Zero Feature Ref Counts
FID : 48(AL_FID_L2_PM), Ref Count : 1
FID : 77(AL_FID_STATS), Ref Count : 1
FID : 51(AL_FID_L2_MATM), Ref Count : 1
FID : 13(AL_FID_SC), Ref Count : 1
FID : 26(AL_FID_QOS), Ref Count : 1
Sub block information
FID : 48(AL_FID_L2_PM), Private Data : 0x54072618
FID : 26(AL_FID_QOS), Private Data : 0x514d31b8
The culprit interface is identified here. Gig2/0/20 is where there is a traffic generator that pumps ARP traffic. If you shut this down, then it would resolve the problem and minimize the CPU usage.
The only drawback with the method discussed in the last section is that it only dumps the last packet that goes into the sink, and it cannot be the culprit.
A better way to troubleshoot this would be to use a feature called FED tracing. Tracing is a packet capture method (using various filters) that are pushed by the FED to the CPU. FED tracing is not as simple as the Netdr feature on the Cisco Catalyst 6500 Series Switch, however.
Here the process is broken into steps:
3850-2#set trace control fed-punject-detail enable
3850-2#show mgmt-infra trace settings fed-punject-detail
One shot Trace Settings:
Buffer Name: fed-punject-detail
Default Size: 32768
Current Size: 32768
Traces Dropped due to internal error: No
Total Entries Written: 0
One shot mode: No
One shot and full: No
Disabled: False
3850-2#set trace control fed-punject-detail buffer-size <buffer size>
3850-2#set trace control fed-punject-detail buffer-size ?
<8192-67108864> The new desired buffer size, in bytes
default Reset trace buffer size to default
3850-2#set trace fed-punject-detail direction rx filter_add <filter>
3850-2#set trace fed-punject-detail direction rx filter_add ?
cpu-queue rxq 0..31
field field
offset offset
3850-2#set trace fed-punject-detail direction rx filter_add cpu-queue <start queue>
<end queue>
3850-2#set trace fed-punject-detail direction rx filter_add cpu-queue 16 16
3850-2#set trace fed-punject-detail direction rx match_all
3850-2#set trace fed-punject-detail direction rx filter_enable
3850-2#show mgmt-infra trace messages fed-punject-detail
[11/25/13 07:05:53.814 UTC 2eb0c9 5661]
00 00 00 00 00 4e 00 40 07 00 02 08 00 00 51 3b
00 00 00 00 00 01 00 00 03 00 00 00 00 00 00 01
00 00 00 00 20 00 00 0e 00 00 00 00 00 01 00 74
00 00 00 04 00 54 41 02 00 00 00 00 00 00 00 00
[11/25/13 07:05:53.814 UTC 2eb0ca 5661]
ff ff ff ff ff ff aa bb cc dd 00 00 08 06 00 01
08 00 06 04 00 01 aa bb cc dd 00 00 c0 a8 01 0a
ff ff ff ff ff ff c0 a8 01 14 00 01 02 03 04 05
06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 f6 b9 10 32
[11/25/13 07:05:53.814 UTC 2eb0cb 5661] Frame descriptors:
[11/25/13 07:05:53.814 UTC 2eb0cc 5661]
=========
fdFormat=0x4 systemTtl=0xe
loadBalHash1=0x8 loadBalHash2=0x8
spanSessionMap=0x0 forwardingMode=0x0
destModIndex=0x0 skipIdIndex=0x4
srcGpn=0x54 qosLabel=0x41
srcCos=0x0 ingressTranslatedVlan=0x3
bpdu=0x0 spanHistory=0x0
sgt=0x0 fpeFirstHeaderType=0x0
srcVlan=0x1 rcpServiceId=0x2
wccpSkip=0x0 srcPortLeIndex=0xe
cryptoProtocol=0x0 debugTagId=0x0
vrfId=0x0 saIndex=0x0
pendingAfdLabel=0x0 destClient=0x1
appId=0x0 finalStationIndex=0x74
decryptSuccess=0x0 encryptSuccess=0x0
rcpMiscResults=0x0 stackedFdPresent=0x0
spanDirection=0x0 egressRedirect=0x0
redirectIndex=0x0 exceptionLabel=0x0
destGpn=0x0 inlineFd=0x0
suppressRefPtrUpdate=0x0 suppressRewriteSideEfects=0x0
cmi2=0x0 currentRi=0x1
currentDi=0x513b dropIpUnreachable=0x0
srcZoneId=0x0 srcAsicId=0x0
originalDi=0x0 originalRi=0x0
srcL3IfIndex=0x2 dstL3IfIndex=0x0
dstVlan=0x0 frameLength=0x40
fdCrc=0x7 tunnelSpokeId=0x0
=========
[11/25/13 07:05:53.814 UTC 2eb0cd 5661]
[11/25/13 07:05:53.814 UTC 2eb0ce 5661] PUNT PATH (fed_punject_rx_process_packet:
830):RX: Q: 16, Tag: 65561
[11/25/13 07:05:53.814 UTC 2eb0cf 5661] PUNT PATH (fed_punject_get_physical_iif:
579):RX: Physical IIF-id 0x104d88000000033
[11/25/13 07:05:53.814 UTC 2eb0d0 5661] PUNT PATH (fed_punject_get_src_l3if_index:
434):RX: L3 IIF-id 0x101b6800000004f
[11/25/13 07:05:53.814 UTC 2eb0d1 5661] PUNT PATH (fed_punject_fd_2_pds_md:478):
RX: l2_logical_if = 0x0
[11/25/13 07:05:53.814 UTC 2eb0d2 5661] PUNT PATH (fed_punject_get_source_cos:638):
RX: Source Cos 0
[11/25/13 07:05:53.814 UTC 2eb0d3 5661] PUNT PATH (fed_punject_get_vrf_id:653):
RX: VRF-id 0
[11/25/13 07:05:53.814 UTC 2eb0d4 5661] PUNT PATH (fed_punject_get_src_zoneid:667):
RX: Zone-id 0
[11/25/13 07:05:53.814 UTC 2eb0d5 5661] PUNT PATH (fed_punject_fd_2_pds_md:518):
RX: get_src_zoneid failed
[11/25/13 07:05:53.814 UTC 2eb0d6 5661] PUNT PATH (fed_punject_get_acl_log_direction:
695): RX: : Invalid CMI2
[11/25/13 07:05:53.814 UTC 2eb0d7 5661] PUNT PATH (fed_punject_fd_2_pds_md:541):RX:
get_acl_log_direction failed
[11/25/13 07:05:53.814 UTC 2eb0d8 5661] PUNT PATH (fed_punject_get_acl_full_direction:
724):RX: DI 0x513b ACL Full Direction 1
[11/25/13 07:05:53.814 UTC 2eb0d9 5661] PUNT PATH (fed_punject_get_source_sgt:446):
RX: Source SGT 0
[11/25/13 07:05:53.814 UTC 2eb0da 5661] PUNT PATH (fed_punject_get_first_header_type:680):
RX: FirstHeaderType 0
[11/25/13 07:05:53.814 UTC 2eb0db 5661] PUNT PATH (fed_punject_rx_process_packet:916):
RX: fed_punject_pds_send packet 0x1f00 to IOSd with tag 65561
[11/25/13 07:05:53.814 UTC 2eb0dc 5661] PUNT PATH (fed_punject_rx_process_packet:744):
RX: **** RX packet 0x2360 on qn 16, len 128 ****
[11/25/13 07:05:53.814 UTC 2eb0dd 5661]
buf_no 0 buf_len 128
<snip>
ff ff ff ff ff ff - destination MAC address
aa bb cc dd 00 00 - source MAC address
[11/25/13 07:05:53.814 UTC 2eb0ce 5661] PUNT PATH (fed_punject_rx_process_packet:
830):RX: Q: 16, Tag: 65561
[11/25/13 07:05:53.814 UTC 2eb0cf 5661] PUNT PATH (fed_punject_get_physical_iif:
579):RX: Physical IIF-id 0x104d88000000033
3850-2#show platform port-asic ifm iif-id 0x0104d88000000033
Interface Table
Interface IIF-ID : 0x0104d88000000033
Interface Name : Gi2/0/20
Interface Block Pointer : 0x514d2f70
Interface State : READY
Interface Stauts : IFM-ADD-RCVD, FFM-ADD-RCVD
Interface Ref-Cnt : 6
Interface Epoch : 0
Interface Type : ETHER
Port Type : SWITCH PORT
Port Location : LOCAL
Slot : 2
Unit : 20
Slot Unit : 20
Active : Y
SNMP IF Index : 22
GPN : 84
EC Channel : 0
EC Index : 0
ASIC : 0
ASIC Port : 14
Port LE Handle : 0x514cd990
Non Zero Feature Ref Counts
FID : 48(AL_FID_L2_PM), Ref Count : 1
FID : 77(AL_FID_STATS), Ref Count : 1
FID : 51(AL_FID_L2_MATM), Ref Count : 1
FID : 13(AL_FID_SC), Ref Count : 1
FID : 26(AL_FID_QOS), Ref Count : 1
Sub block information
FID : 48(AL_FID_L2_PM), Private Data : 0x54072618
FID : 26(AL_FID_QOS), Private Data : 0x514d31b8
You have once again identified the source interface and culprit.
Tracing is a powerful tool that is critical in order to troubleshoot high CPU usage problems and provides plenty of information in order to successfully resolve such a situation.
Use this command in order to trigger a log to be generated at a specific threshold:
process cpu threshold type total rising <CPU %> interval <interval in seconds>
switch <switch number>
The log generated with the command looks like this:
*Jan 13 00:03:00.271: %CPUMEM-5-RISING_THRESHOLD: 1 CPUMEMd[6300]: Threshold: : 50, Total CPU Utilzation(total/Intr) :50/0, Top 3 processes(Pid/Util) : 8622/25, 5753/12, 9663/0
The log generated provides this information:
8622/25 - 8622 is PID for IOSd and 25 implies that this process is using 25% CPU.
5753/12 - 5733 is PID for FED and 12 implies that this process is using 12% CPU.
The EEM script is shown here:
event manager applet highcpu
event syslog pattern "%CPUMEM-5-RISING_THRESHOLD"
action 0.1 syslog msg "high CPU detected"
action 0.2 cli command "enable"
action 0.3 cli command "show process cpu sorted | append nvram:<filename>.txt"
action 0.4 cli command "show process cpu detailed process <process name|process ID>
sorted | nvram:<filename>.txt"
action 0.5 cli command "show platform punt statistics port-asic 0 cpuq -1
direction rx | append nvram:<filename>.txt"
action 0.6 cli command "show platform punt statistics port-asic 1 cpuq -1
direction rx | append nvram:<filename>.txt"
action 0.7 cli command "conf t"
action 0.8 cli command "no event manager applet highcpu"
Note: The process cpu threshold command does not currently work in the 3.2.X train. Another point to remember is that this command looks at the average CPU utilization among the four cores and generates a log when this average reaches the percentage that has been defined in the command.
If you have Catalyst 3850 switches that run Cisco IOS® XE Software Release 16.x or later, see Troubleshoot High CPU Usage in Catalyst Switch Platforms Running IOS-XE 16.x.
Revision | Publish Date | Comments |
---|---|---|
3.0 |
12-Dec-2023 |
Recertification |
2.0 |
21-Nov-2022 |
Updated title and introduction, style requirements, machine translation, gerunds, SEO and formatting. |
1.0 |
04-Apr-2014 |
Initial Release |