Graceful Handling of Out of Resource Situations

Table 1. Feature History Table

Feature Name

Release Information

Description

Extending Graceful Handling of Out of Resource (OOR) Situations for SR-TE traffic

Release 7.3.3

This release extends the graceful handling of dropped traffic in an out-of-resource situation to SR-TE traffic.

Modified command:

Extending Graceful Handling of Out of Resource (OOR) Situations for GRE and MPLS traffic

Release 7.3.2

This release extends the graceful handling of dropped traffic in an out-of-resource situation to GRE and MPLS-enabled traffic.

Modified command:

Graceful Handling of Out of Resource (OOR) Situations

Release 7.3.1

This feature enables you to resend any traffic that was dropped during an OOR situation. This enables better monitoring and management of failed traffic.

Commands introduced and modified are:

Graceful Handling of Out of Resource Situations Overview

OOR situations occur when the network is unable to handle the overload of traffic. It can lead to traffic loss. Graceful handling of OOR situations denotes the router recovers without any traffic loss of unaffected traffic. The recovery of unaffected traffic occurs when the OOR situation is cleared.

When a router reaches the OOR state, you release traffic with few prefixes to reduce the utilization of hardware and SDK resources. You can release traffic with the help of a traffic generator. With the reduced utilization of hardware and SDK resources, the router comes out of the OOR state. After the router is out of the OOR state, you can reinject the traffic that you had released. You can reinject the traffic with the help of traffic generator in a favorable way. You can control the monitoring and resending of failed traffic and gracefully handle OOR situations.

The OOR State in the output of the show controllers npu resources command changes when the router reaches an OOR situation due to heavy traffic or extreme utilization of hardware and SDK resources. The OOR State changes from Green to Yellow, and finally to Red. When the OOR State reaches Red, the Syslog in the router generates a notification and sends it to the end user.

The different OOR State signifies the following:

  • Green: Favorable utilization of hardware and SDK resources

  • Yellow: Router is advancing toward the OOR state

  • Red: Router has reached the OOR state

You can configure the threshold value at which a router reaches the OOR Red or Yellow states by using the oor hw command.

The default values for OOR states are as follows:

  • The Yellow state occurs when 80% of the router's hardware and SDK resources are utilized.

  • The Red state occurs when 95% of the router's hardware and SDK resources are utilized.

For more information, see oor hw command in the chapter Graceful Handling of OOR Situations Commands of System Monitoring Command Reference for Cisco 8000 Series Routers.

You can use the show controllers npu resources command to view the status of utilization of hardware and Software Development Kit (SDK) resources:

Table 2. NPU Resources per Traffic Type

Traffic Type

NPU Resource

IPv4/IPv6

  • lpmtcam

  • centralem

  • stage1lbgroup

  • stage1lbmember

  • stage2lbgroup

  • stage2lbmember

MPLS

  • egresslargeencap

  • centralem

GRE

  • egresslargeencap

  • sipidxtbl

  • myipv4tbl

  • tunneltermination

SR-TE

  • counterbank

  • egresslargeencap

  • egresssmallencap

  • stage1lbgroup

  • stage1lbmember

  • stage2lbgroup

  • stage2lbmember

Restrictions

Graceful handling of OOR situations is only supported for IPv4, IPv6, MPLS, SR-TE, and GRE traffic.

Configuration

To configure OOR limits, use the oor hw command.

Router(config)#oor hw threshold red 96
Router(config)#oor hw threshold yellow 85
Router(config)#commit

Verification

To verify the OOR state of a router, use the show logging | inc OOR command.

Router# show logging | inc OOR
Wed Jan 6 23:36:34.138 EST
LC/0/0/CPU0:Jan 6 23:01:09.609 EST: npu_drvr[278]: %PLATFORM-OFA-4-OOR_YELLOW : NPU 1, Table nhgroup, Resource stage2_lb_group
LC/0/0/CPU0:Jan 6 23:01:29.655 EST: npu_drvr[278]: %PLATFORM-OFA-4-OOR_YELLOW : NPU 1, Table nhgroup, Resource stage2_lb_member
LC/0/0/CPU0:Jan 6 23:01:38.938 EST: npu_drvr[278]: %PLATFORM-OFA-1-OOR_RED : NPU 3, Table nhgroup, Resource stage2_lb_group

To verify the NPU resource utilization for GRE traffic of a router, use the show controllers npu resources command.

Router# show controllers npu resources lpmtcam location 0/0/CPU0
Thu Dec 17 11:43:06.931 EST
HW Resource Information
    Name                            : lpm_tcam
    Asic Type                       : Pacific
 
NPU-0
OOR Summary
        Estimated Max Entries    : 100
        Red Threshold            : 95 % >>> shows the threshold for OOR Status Red
        Yellow Threshold         : 80 % >>> shows the threshold for OOR Status Yellow
        OOR State                : Red >>> shows that the OOR status is Red
        OOR State Change Time    : 2020.Dec.17 09:53:02 EST >>> shows the time at which OOR status changed to Red

Note


The IP BGP ECMP over BVI uses stage2lbgroup and stage2lbmember NPU resources. You can use the following commands to monitor the total in-use values for resource utilization. The nhgroup value in the command outputs does not mean the hardware resource usage value. Please refer to the Total In-Use value to get the current hardware resource usage.

  • show controllers npu resources stage2lbgroup location <location>

    For example,
    Router# show controllers npu resources stage2lbgroup location location all
  • show controllers npu resources stage2lbmember location <location>

    For example,
    Router# show controllers npu resources stage2lbmember location location all

Graceful Handling of Out of Resource for Protection Group

Table 3. Feature History Table

Feature Name

Release Information

Description

Graceful Handling of Out of Resource (OOR) for Protection Group

Release 7.5.4

This feature adds a new feedback path from the Hardware Abstraction Layer to Forwarding Information Base, to enable improved Graceful Recovery.

This feature helps the system recover by re-programming routes when hardware resources become available.

The following commands are modified:

Data structure memories in the NPU are limited, and can reach OOR situations either due to network state changes or user configuration change.

With Cisco IOS XR Release 7.5.4, you can improve the network state change-related OOR handling, which is distributed by control protocols. A new feedback path between Open Forwarding Abstraction (OFA) and Forwarding Information Base (FIB) provides for Graceful Recovery.

OFA or SDK async errors and OOR failures are now reported back to FIB. The FIB is a subsystem responsible for updating NPU on the L3 network state, such as routes and adjecencies. This process is goverened by the fib_mgr utility in XR, on the router.

The following states summarize the stages within OOR for Protection Groups:

  • Accounting - Provides a snapshot of the hardware resource utilization. Accounting happens in the SDK, and the Open Forwarding Abstraction provides hooks to read the usage level for CLI or telemetry.

  • Monitoring - Checks and tracks the utlization based on the following threshold levels:

    • Green: Resource utilization is normal and within limit. No action from the user is required.

    • Yellow: Resource utilization has reached a warning point where users are made aware.

    • Red: Resource utilization has reached a critical point where traffic loss could occur.

  • Reporting - Generates syslogs when OOR crosses the threshold.

    During changes in the threshold level, OFA will generate a syslog and these can be accessed using event-driven telemetry. Syslogs are generated to notify or alert users about the OOR state change for each resource table. The syslog messages are generated when resource utilization level crosses either above or below a threshold. For example, a Yellow notification is issued when the resource utlization has crossed the threshold warning point. A Green notification is generated when the threshold level reverts to below the resource utilization limit.
    
    LC/0/0/CPU0:Dec 16 15:53:20.692 EST: npu_drvr[212]: %PLATFORM-OFA-4-OOR_YELLOW : NPU 2, Table ip6rte, Resource central_em  
    LC/0/0/CPU0:Dec 16 15:53:36.586 EST: npu_drvr[212]: %PLATFORM-OFA-1-OOR_RED : NPU 1, Table ip6rte, Resource central_em  
    LC/0/0/CPU0:Dec 16 16:37:15.688 EST: npu_drvr[212]: %PLATFORM-OFA-4-OOR_YELLOW : NPU 2, Table iprte, Resource lpm_tcam  
    LC/0/0/CPU0:Dec 16 16:37:16.186 EST: npu_drvr[212]: %PLATFORM-OFA-5-OOR_GREEN : NPU 3, Table iprte, Resource lpm_tcam  
    LC/0/0/CPU0:Dec 16 16:37:17.273 EST: npu_drvr[212]: %PLATFORM-OFA-4-OOR_YELLOW : NPU 2, Table iprte, Resource lpm_tcam  
    LC/0/0/CPU0:Dec 16 16:37:19.760 EST: npu_drvr[212]: %PLATFORM-OFA-5-OOR_GREEN : NPU 0, Table iprte, Resource lpm_tcam 
    
    The following is an example for an OOR Protection Group error syslog :
    
    RP/0/RP0/CPU0:Aug 12 23:26:50.286 UTC: npu_drvr[361]: %ROUTING-FIB-4-PROTGRP_OOR : 
    Protected Path programming failure:  Protection Group OOR trans_id 280775408 NPU 0 nhgroup 2491381947832520  
  • The following is an example for an OOR Protection Group alarm syslog:
    
    npu_drvr[389]: %PLATFORM-OFA-4-OOR_YELLOW : NPU 0, Table nhgroup, Resource protection_group
    npu_drvr[389]: %PLATFORM-OFA-1-OOR_RED : NPU 0, Table nhgroup, Resource protection_group
    %PLATFORM-OFA-5-OOR_GREEN : NPU 0, Table nhgroup, Resource protection_group
     
  • Graceful-Recovery - Tracks the routes, labels, and encapsulations that are not programmed due to OOR. The fib_mgr (producer) of routes tracks and lists the routes that are in OOR Yellow or Red state, and retries when resource availability changes to Green state.

Configuring Graceful Handling of OOR for Protection Group

For information on configuring the OOR for Protection Groups, see the configuration steps in the Graceful Handling of Out of Resource Situations Overview section. The default values are Yellow (80%) and Red (95%). The initial state is Green.

For information on the entries that are queued in the FIB OOR retry queue based on the object queue ID, see the show cef object-queue location command.

For information on the async response error stats that are sent through the out-of-band async channel from OFA npu_drvr to FIM Mgr, you can use the show ofa transport async stats client fib command.