Introduction
This document describes the process to identify and troubleshoot performance issues on Enterprise Routing Platforms caused by NAT bottleneck issues.
Background Information
High utilization and performance issues on the Cisco Quantum Flow Processor (QFP) can be observed on a Cisco router when there is a mix of NAT and Non-NAT traffic flows present on the same interface. This may also lead to other performance issues such as interface errors or slownes.
Note: The QFP is located on the Embedded Services Processor (ESP) and it is in charge of the data plane and packet processing for all the inbound and outbound traffic flows.
Symptoms
It is important to validate and confirm these symptoms from the router In order to identify this behavior:
1. HIgh QFP Load alerts. These alerts appear when the Load exceeds the threshold of 80%
Note: You can also run the show platform hardware qfp active utilization summary command in order to reveal the load on the QFP and the traffic rates.
2. Interface errors. Packets might be dropped due to backpressure If there is high utilization of QFP. In such cases, Overruns and Output Drops are commonly observed on the interfaces. To display this information, you can run the show interfaces command
3. In some scenarios, users can complain of slowness on the network.
4. From Packet Trace capture with Feature Invocation Array (FIA) trace option, we can observe that NAT feature consumes more resources than expected. In the example below, we can see that the lapsed time for the IPV4_NAT_INPUT_FIA feature is significantly larger than the lapsed time from other features. This behavior usually indicates that the QFP takes more time to process this feature and, as result, more resources from the QFP are used for NAT.
5. High volume of non-NATed traffic on a NATed interface. The non-NATed traffic consume a high amount of resources and cause the QFP utilization spikes. This behavior can be validated by checking the nummber of misses as shown in the commands below.
Workaround/Fix for high QFP due to non-NATed traffic
Solution 1
Usually, the recommendation by Cisco for this type of issues is try to redirect the non-NATed traffic from the NATed interface to a different interface from same chassis or from another router. If there are no available interfaces, then, you can try to at least reduce this type of traffic on the affected interface.
Solution 2
Another workaround is to make adjusments to increase the cache on the NAT Gatekeeper feature in order to try to reduce the number of misses from the gatekeeper. This feature was first introduced in software version 12.2(33)XND. The purpose of this feature is try to reduce the amount of resoruces consumed by non-NATed flows in order to prevent excessive utilization on the CPU and on the Quantum Flow Processor (QFP).
The example below shows how to adjust the Gatekeeper on a Cisco router. The recommendation is to start from 64K. It is important to highlight this value should be represented in powers of 2. Otherwise, the value will be automatically set to the next lower size.