Table 2. Feature History Table
Feature Name
|
Release Information
|
Feature Description
|
Virtual Output Queue Watchdog
|
Release 24.2.1
|
We ensure the continuous movement of traffic queues, which is crucial for enforcing QoS policies, even when hardware issues
disrupt the Virtual Output Queue (VOQ) and impede the flow of traffic. With this feature, if the router detects a stuck queue
on a line card, it shuts down the line card, and if it detects a stuck queue on a fabric card, the router triggers a hard
reset on the NPU. A queue is considered stuck only when there is no transmission for one minute.
The feature is disabled by default and can be enabled using the command hw-module voq-watchdog feature enable .
The feature is supported only on Cisco 8000 Series Routers (Modular) with Cisco Silicon One Q100 or Q200 ASICs.
The feature introduces these changes:
CLI:
|
Virtual Output Queue (VOQ) is a mechanism to manage a situation where a packet at the front of the queue is unable to move
forward due to a busy destination, causing a delay for all packets behind it, even those with available output ports. VOQ
prevents packet delay or loss by providing each output port with a dedicated queue for every input port. This mechanism ensures
that packets are transmitted in the correct order and maintains quality of service (QoS) by managing congestion.
A VOQ can become stuck when it fails to transmit traffic for a certain period despite the non-empty queue. There are a few
scenarios in which this can happen. For example, when a port or traffic class (Output Queue, OQ) is PFC-paused (Priority Flow
Control) by the peer, all the VOQs sending traffic to this OQ are also paused. Another scenario is when a port's higher priority
traffic class (TC) consumes all the port bandwidth, causing lower priority TCs to run out of credits and their corresponding
VOQs to become stuck. This can cause VOQs to stop functioning, potentially leading to traffic loss.
VOQ watchdog actively identifies and resolves issues in VOQs that have not sent packets for one minute. It detects stuck VOQs
in both line cards (LCs) and fabric cards (FCs). The feature is disabled by default and can be enabled using the command hw-module voq-watchdog feature enable .
VOQ Watchdog Behavior on Line Cards
By default, the VOQ watchdog feature is disabled on the line cards; as a result, stuck VOQs are not detected, and there are
no interrupts or syslog notifications.
However, after you enable this feature using the hw-module voq-watchdog feature enable command, the router regularly checks the line cards for packets stuck in VOQs. If the router detects any such packets, it
will raise a notification and shut down the line card.
The router displays the following messages when it detects a stuck VOQ.
Syslog Notification - Stuck VOQ; Action: Line card Shutdown
LC/0/0/CPU0:Feb 22 09:16:56.090 UTC: npu_drvr[203]: %FABRIC-NPU_DRVR-3-VOQ_HARDWARE_WATCHDOG :
[7127] : npu[1]: hardware_watchdog.voq_error: VOQ slc:2 voqnum:19955 isinhbm:1 smscntxtnum:3 hbmcntxtnum:14
isstuck:1 nochangesec:64 rdptr:1728 wrptr:1735 avblcrdts:-16668 is_fabric:0
Stuck VOQ; Action: Line card Shutdown
LC/0/0/CPU0:Jan 30 15:10:57.299 UTC: npu_drvr[241]: %PKT_INFRA-FM-2-FAULT_CRITICAL :
ALARM_CRITICAL :VOQ WATCHDOG Alarm :DECLARE :: Shutdown card due to voq watchdog error on ASIC 1.
Line cards automatically shut down if stuck VOQs are detected, making it impossible to identify the root cause of the problem.
However, you can prevent the line cards from shutting down by using the hw-module voq-watchdog cardshut disable command to disable the shutdown function. This way, the router will send syslog notifications when it detects stuck VOQs
without shutting down the line card.
After disabling the shutdown action on the line card, the router displays the following messages when it detects a stuck VOQ.
Syslog Notification - Stuck VOQ; Action: None
LC/0/0/CPU0:Feb 22 09:16:56.090 UTC: npu_drvr[203]: %FABRIC-NPU_DRVR-3-VOQ_HARDWARE_WATCHDOG :
[7127] : npu[1]: hardware_watchdog.voq_error: VOQ slc:2 voqnum:19955 isinhbm:1 smscntxtnum:3 hbmcntxtnum:14
isstuck:1 nochangesec:64 rdptr:1728 wrptr:1735 avblcrdts:-16668 is_fabric:0
Stuck VOQ; Action: None
LC/0/0/CPU0:Feb 22 09:16:56.090 UTC: npu_drvr[203]: %FABRIC-NPU_DRVR-3-VOQ_HARDWARE_WATCHDOG :
[7127] : npu[1]: hardware_watchdog.voq_error: VOQ Watchdog Action Handling Skipped Due to User Configuration
VOQ Watchdog Behavior on Fabric Cards
By default, the VOQ watchdog feature is disabled on the fabric cards; as a result, stuck VOQs are not detected, and there
are no interrupts or syslog notifications.
After you enable the feature using the hw-module voq-watchdog feature enable command, the router regularly checks the fabric cards for any packets stuck in VOQs. If such packets are detected, the router
raises a syslog notification and hard resets the fabric element (FE) device. After five hard resets, the fabric card undergoes
a graceful reload.
The router logs the following syslog notification upon detecting a stuck VOQ on an FC.
Syslog: RP/0/RP0/CPU0:Feb 22 09:16:47.721 UTC: npu_drvr[335]: %FABRIC-NPU_DRVR-3-ASIC_ERROR_ACTION :
[7912] : npu[6]: HARD_RESET needed for hardware_watchdog.voq_error