This document lists the causes of %SYS-3-CPUHOG error messages, and explains how to troubleshoot them.
There are no specific requirements for this document.
This document is not restricted to specific software and hardware versions.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
For more information on document conventions, refer to the Cisco Technical Tips Conventions.
To reduce the impact of runaway processes, Cisco IOSĀ® software uses a process watchdog timer that allows the scheduler to periodically poll the currently active process. This feature is not the same as preemption. Instead, it is a fail-safe mechanism, which ensures that the system does not become unresponsive or completely lock up due to the total consumption of the CPU by any process.
If a process appears to hang (for example, if it continues to run for a long time), the scheduler can force the process to terminate.
Every time the scheduler allows a process to run on the CPU, it starts a watchdog timer for that process. After a preset period, if the process continues to run, the watchdog process generates an interrupt and causes a router restart by a "software forced crash" (the stack trace shows a watchdog process as the trigger of the crash).
The first time the watchdog expires, the scheduler prints a warning message such as:
%SYS-3-CPUHOG: Task ran for 2148 msec (20/13), Process = IP Input, PC = 3199482 -Traceback= 314B5E6 319948A
This message indicates a process has held up the CPU. Here, it is the "IP Input" process. This message usually appears during transient circumstances, such as an Online Insertion and Removal (OIR) when the router boots up, or under heavy traffic conditions. The "%SYS-3-CPUHOG" messages must not appear during normal operation of the router.
If the router is busy at interrupt level after a process was scheduled to run, the accounting of the duration for which the process ran can be inaccurate. This is because, the CPUHOG only tracks process level tasks. It does not track interrupt level tasks that are permitted to interrupt and gain control of the CPU.
The typical process to run at interrupt level is packet switching.
This section explains how you can troubleshoot CPUHOG messages in different scenarios.
CPUHOG messages at the time of the boot sequence are fairly common. The error message itself means that the boot process has held the CPU just a little longer than the system wanted it to hold, and then has sent a message to the console output to inform you about it. The process in this case is "Boot Load," which indicates where the CPUHOG occurred:
System Bootstrap, Version 11.1(12)XA, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1) Copyright (c) 1997 by cisco Systems, Inc. C1600 processor with 16384 Kbytes of main memory program load complete, entry point: 0x4018060, size: 0x108968 %SYS-3-CPUHOG: Task ran for 2040 msec (6/6), Process = Boot Load, PC =40B513A -Traceback= 407EB6E 407F628 407D118 40180E0 40005B0 4015C3E 40152B2 4014ED4 40025B8 4003086 4015636 40021A8 400C616program load complete, entry point: 0x2005000, size: 0x4195b9 Self decompressing the image : ############################################################################ ############################################################################ ################################################################## [OK]
You can safely ignore this error message. At the time of the boot process, the boot loader uses the CPU for 2-4 seconds, and does not release it. This is not a problem at boot time, because the CPU needs to run only the boot loader at that point. More recent boot ROMs suppress the printing of that particular message.
You can also encounter a CPUHOG message from the boot helper image whenever the router loads a large image, for example, when you use the Cisco 1600 Series Routers. These routers are configured with more than 16 MB DRAM.
This message occurs only when the image is being loaded, and has no effect on the operation of the system or the loading process. In any case, this is a cosmetic problem as it has no effect on the normal operation of the system.
CPUHOG messages are common at the time of an OIR, because the router has to perform a set of complicated and relatively long tasks. There is no need to worry about CPUHOG messages that occur during OIRs, as long as the card that was inserted comes up properly.
A CPUHOG message can appear when you attempt to access a Flash device (such as a Flash card, or a Flash single inline memory module (SIMM)) when the device is defective or when it does not respond. If the problem recurs, please contact your TAC representative.
Note: If you have a Catalyst 6500 that runs Integrated Cisco IOS software (Native Mode) or Hybrid Mode, and which has CPUHOG messages when you format MSFC (RP) bootflash:, it can be the problem mentioned in Cisco Bug ID CSCdw53175 (registered customers only) , which is resolved in Cisco IOS Software Releases 12.1.11b, 12.1(12c)E5, or12.1(13)E, and later versions.
On the Cisco 12000 Series Internet Router, the forwarding information base (FIB) is maintained on each line card for use in packet switching. Due to the structure of the FIB tree, routing changes with short subnet masks (between /1 and /4) can cause messages like this in the console log:
SLOT 1: %SYS-3-CPUHOG: Task ran for 4024 msec (690/0), process = CEF IPC Background, PC = 400B8908. -Traceback= 400B8910 408FF588 408FF6F4 408FFE8C 400A404C 400A4038
When a process in Cisco IOS software runs for longer than 2000ms (2 seconds), a CPUHOG message is displayed. In the case of Cisco Express Forwarding (CEF) updates for very short subnet masks, the amount of processing required can be more than 2000ms, which can trigger these messages. The "CEF IPC Background" process is the parent process that controls the addition and removal of prefixes from the forwarding tree.
Additionally, if the CPU is locked down for an extended period, the line card can crash due to a Fabric Ping failure, or that FIB can become disabled due to lost IPC communication timeouts. If you need to troubleshoot these problems, see Troubleshooting Fabric Ping Timeouts and Failures on the Cisco 12000 Series Internet Router.
In general, routing updates with masks shorter than /7 are erroneous or malicious. Cisco recommends that all customers configure adequate route filtering to prevent the processing and propagation of such updates. If you need assistance to configure routing filters, contact your technical support representative.
A CPUHOG message can also be triggered due to the "CEF IPC Background" process when you clear the Border Gateway Protocol (BGP) or the routing table.
Most of the time, these error messages are due to an internal software bug in the Cisco IOS Software.
The first step to troubleshoot this sort of error message is to look for a known bug. You can use the Bug Toolkit (registered customers only) to find a bug that matches the error. In the Bug Toolkit page, click Launch Bug Toolkit, and select Search for Cisco IOS-related bugs. In order to narrow your search, you can select your Cisco IOS software version under number 1. Under number 3, you can perform a keyword search for "CPUHOG, <process>" where process is the corresponding process, such as Virtual Exec or IP Input.
You can upgrade to the latest Cisco IOS Software image in your release train to eliminate all fixed CPUHOG bugs.
If you still need assistance after following the troubleshooting steps above and want to open a service request (registered customers only) with the Cisco TAC, be sure to include the following information: |
---|
Note: Please do not manually reload or power-cycle the router before collecting the above information unless required to troubleshoot a line card crash on the Cisco 12000 Series Internet Router, as this can cause important information to be lost that is needed for determining the root cause of the problem. |
Revision | Publish Date | Comments |
---|---|---|
1.0 |
24-Jun-2008 |
Initial Release |