GTPU Error Indication Enhancement

Feature Description

This enhancement provides a solution to avoid GTPU Path Failure when a burst of GTPU Error Indication occurs. This enhancement is applicable only for SGSN.

Consider the following scenario:

  1. Following a kernel crash and Hardware Failure (Fabric corruption) in a Demux Card, the SGSN is unable to respond Echo Requests from the GGSN. This results in Path Failure detection by the GGSN and a large number of sessions are cleaned up.
  2. But the sessions are still active at the SGSN in PSC3 Cards where Session Manager is running. The SGSN sends uplink data for these sessions and this triggers a flood of GTPU Error Indications (~6 to ~9 million) from the GGSN to SGSN.
  3. Simultaneously a Demux card migration is triggered in the SGSN to recover from the kernel crash and Hardware Failure. After the migration is completed, the SGSN restarts the Path Management Echo Requests. But the GGSN had already started sending Echo requests as soon as the new sessions were set up at the GGSN. This difference in the restarting of the Echo requests from both ends on the path leads to delay in detecting path failure between the SGSN and GGSN if echo responses are not received for any reason.
  4. Once the Demux card has recovered at SGSN, the following are observed:
    • A flood of GTPU Error Indication messages further result in packet drops at the SGSN
    • The Echo Request causing another path failure at the GGSN
    • Echo Response cause a path failure on the SGSN with delay as well as loss of GTPU Error Indications at SGSN
  5. This delay in Path Failure results in another flood of GTPU Error Indications in response to SGSN uplink data for the active sessions, which were already cleaned up at the GGSN (those created after first path failure). This flood of GTPU Error Indications results in additional packet drops at the SGSN. The cycle of cleaning up sessions and setting up new sessions continues until the SGSN is restarted.

The issue is resolved by creating an additional midplane socket for GTPU Error Indications so that flood of GTPU Error Indication will not create any impact on Path Management. New midplane socket and flows have been introduced to avoid path management failure due to flood of GTPU Error Indication packets. GTPU Echo Request/Response will continue to be received at existing midplane sockets. A new path for GTPU Error Indication will prevent issues in Path Management towards GGSN or towards RNC and avoids un-wanted detection of path failures. This enhancement requires new flows to be installed at the NPU.

The following existing statistics are helpful in observing loss of packets and drop of GTPU Error Indication Packets:

show sgtpu statistics

Total Error Ind Rcvd: 0

Rcvd from GGSN: 0

Rcvd from RNC: 0

Rcvd from GGSN through RNC: 0

Rcvd from RNC through GGSN: 0

The following show commands are useful to verify the NPU related statistics:

  • To check the flow id range associated with sgtpcmgr, use the following command:

    For ASR 5500: show npumgr flow range summary

  • To check whether flow corresponding to GTPU Error Indication is installed or not, use the following command:

    For ASR 5500: show npumgr flow statistics