Table 1. Feature History Table
Feature Name
|
Release Information
|
Feature Description
|
Virtual Output Queue Watchdog
|
Release 7.3.6
|
You can now quickly detect any stuck Virtual Output Queues (VOQ) in a line or fabric card within a minute. If the router detects
a stuck queue on a line card, it automatically shuts down the line card to prevent potential issues. If the router detects
a stuck queue on a fabric card, it triggers a hard reset on the Network Processing Unit (NPU). This ensures the continuous
movement of traffic queues, which is crucial for enforcing QoS policies, even when hardware issues disrupt the VOQ and impede
the flow of traffic.
The feature is enabled by default and can be disabled using the command hw-module voq-watchdog feature disable .
The feature is supported only in Cisco 8000 Series Routers with Cisco Silicon One Q200 ASICs.
The feature introduces these changes:
CLI:
|
Virtual Output Queue (VOQ) is a mechanism to manage a situation where a packet at the front of the queue is unable to move
forward due to a busy destination, causing a delay for all packets behind it, even those with available output ports. VOQ
prevents packet delay or loss by providing each output port with a dedicated queue for every input port. This mechanism ensures
that packets are transmitted in the correct order and maintains quality of service (QoS) by managing congestion.
A VOQ can become stuck when it fails to transmit traffic for a certain period despite the non-empty queue. There are a few
scenarios in which this can happen. For example, when a port or traffic class (Output Queue, OQ) is PFC-paused (Priority Flow
Control) by the peer, all the VOQs sending traffic to this OQ are also paused. Another scenario is when a port's higher priority
traffic class (TC) consumes all the port bandwidth, causing lower priority TCs to run out of credits and their corresponding
VOQs to become stuck. This can cause VOQs to stop functioning, potentially leading to traffic loss.
VOQ watchdog actively identifies and resolves issues in VOQs that have not sent packets for over a minute. It detects stuck
VOQs in both line cards (LCs) and fabric cards (FCs). The feature is enabled by default and can be disabled using the command
hw-module voq-watchdog feature disable .
VOQ Watchdog Behavior on Line Cards
By default, the VOQ watchdog feature is enabled on the line cards. The router regularly checks the line cards for packets
stuck in VOQs. If the router detects any such packets, it will raise a notification and shut down the line card.
The router displays the following messages when it detects a stuck VOQ.
Syslog Notification - Stuck VOQ; Action: Line card Shutdown
LC/0/0/CPU0:Feb 22 09:16:56.090 UTC: npu_drvr[203]: %FABRIC-NPU_DRVR-3-VOQ_HARDWARE_WATCHDOG :
[7127] : npu[1]: hardware_watchdog.voq_error: VOQ slc:2 voqnum:19955 isinhbm:1 smscntxtnum:3 hbmcntxtnum:14
isstuck:1 nochangesec:64 rdptr:1728 wrptr:1735 avblcrdts:-16668 is_fabric:0
Stuck VOQ; Action: Line card Shutdown
LC/0/0/CPU0:Jan 30 15:10:57.299 UTC: npu_drvr[241]: %PKT_INFRA-FM-2-FAULT_CRITICAL :
ALARM_CRITICAL :VOQ WATCHDOG Alarm :DECLARE :: Shutdown card due to voq watchdog error on ASIC 1.
Line cards automatically shut down if stuck VOQs are detected, making it impossible to identify the root cause of the problem.
However, you can prevent the line cards from shutting down by using the hw-module voq-watchdog cardshut disable command to disable the shutdown function. This way, the router will send syslog notifications when it detects stuck VOQs
without shutting down the line card.
After disabling the shutdown action on the line card, the router displays the following messages when it detects a stuck VOQ.
Syslog Notification - Stuck VOQ; Action: None
LC/0/0/CPU0:Feb 22 09:16:56.090 UTC: npu_drvr[203]: %FABRIC-NPU_DRVR-3-VOQ_HARDWARE_WATCHDOG :
[7127] : npu[1]: hardware_watchdog.voq_error: VOQ slc:2 voqnum:19955 isinhbm:1 smscntxtnum:3 hbmcntxtnum:14
isstuck:1 nochangesec:64 rdptr:1728 wrptr:1735 avblcrdts:-16668 is_fabric:0
Stuck VOQ; Action: None
LC/0/0/CPU0:Feb 22 09:16:56.090 UTC: npu_drvr[203]: %FABRIC-NPU_DRVR-3-VOQ_HARDWARE_WATCHDOG :
[7127] : npu[1]: hardware_watchdog.voq_error: VOQ Watchdog Action Handling Skipped Due to User Configuration
VOQ Watchdog Behavior on Fabric Cards
By default, the VOQ watchdog feature is enabled on the fabric cards. The router regularly checks the fabric cards for any
packets stuck in VOQs. If such packets are detected, the router raises a syslog notification and hard resets the fabric element
(FE) device. After five hard resets, the fabric card undergoes a graceful reload.
The router logs the following syslog notification upon detecting a stuck VOQ on an FC.
Syslog: RP/0/RP0/CPU0:Feb 22 09:16:47.721 UTC: npu_drvr[335]: %FABRIC-NPU_DRVR-3-ASIC_ERROR_ACTION :
[7912] : npu[6]: HARD_RESET needed for hardware_watchdog.voq_error