Nexus 7000 Troubleshoot Address Resolution Protocol (ARP) Storm Without Inband Capture

Available Languages

Download Options

PDF (7.4 KB)
View with Adobe Reader on a variety of devices
ePub (78.3 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (73.9 KB)
View on Kindle device or Kindle app on multiple devices

Updated:September 11, 2016

Document ID:200652

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

This document describes how to troubleshoot the ARP storm, without any inband ARP traffic.

Background

ARP storm is a common denial-of-service (DoS) attack you would see in the data center enviroment.

The common switch logic to handle ARP packet is that:

ARP packet with broadcast destination Media Access Control (MAC)
ARP packet with unicast destination MAC, which belongs to the switch

will be processed by ARP process in software if the Switch Virtual Interface (SVI) is up in the receiving Vlan.

By this logic, if there is one or more malicous hosts keep sending ARP request in a Vlan, where a switch is the gateway of that Vlan. The ARP request will be processed in software hence causes the switch being overwhelmed. In some older Cisco switch model and version, you will see that ARP process takes the CPU usage up to high level and system is too busy to handle other control plane traffic.The common way to trace such attack is to run inband capture to identify the source MAC of the ARP storm.

In the data center where Nexus 7000 is acting as the aggregation gateway, such impact is reduced by CoPP on Nexus 7000 Series Switches. You could still run inband capture Ethanalyzer on Nexus 7000 Troubleshooting Guide to identify the source MAC of the ARP storm since Control Plane Policing (CoPP) is just a bandit slowing down but not elminating the ARP storm rushing to the CPU.

How about this scenario where:

SVI is down
No excessive ARP packet being punt to CPU
No high CPU due to ARP process

The switch however still see ARP related problem, e.g. direct connected host has incomplete ARP. Is it possiblely caused by ARP storm?

The answer is yes on Nexus 7000.

Root Cause

In the nexus 7000 linecard design, to support ARP packet process in CoPP, ARP request will drive a special logical interface (LIF) then be rate limited by CoPP in forwarding engine (FE). This happens no matter you have a SVI up for the Vlan or not.

Hence, while the final forwarding decision made by FE is to not send the ARP request to inband CPU (in the case no SVI up for the vlan), the CoPP counter is still updated. It leads to CoPP saturated with excessive ARP request and dropping legitimate ARP request/reply. In this scenario, you will not see any excessive inband ARP packets but still being affected by ARP storm.

We have an enhanced bug CSCub47533 filed for this CoPP day one behavior.

Solution

There could be a few options to identify the source of ARP storm in this scenario. One effective option is:

First identify which module the ARP storm comes from

N7K# sh policy-map interface control-plane class copp-system-p-class-normal
Control Plane
 service-policy input copp-system-p-policy-strict

 class-map copp-system-p-class-normal (match-any)
   match access-group name copp-system-p-acl-mac-dot1x
   match exception ip multicast directly-connected-sources
   match exception ipv6 multicast directly-connected-sources
   match protocol arp
   set cos 1
   police cir 680 kbps bc 250 ms
     conform action: transmit
     violate action: drop
   module 3:
     conformed 4820928 bytes,
       5-min offered rate 0 bytes/sec
       peak rate 104 bytes/sec at Thu Aug 25 08:12:12 2016
     violated 9730978848 bytes,
       5-min violate rate 6983650 bytes/sec
       peak rate 7632238 bytes/sec at Thu Aug 25 00:43:33 2016
   module 4:
     conformed 4379136 bytes,
       5-min offered rate 0 bytes/sec
       peak rate 38 bytes/sec at Wed Aug 24 07:12:09 2016
     violated 0 bytes,
       5-min violate rate 0 bytes/sec
       peak rate 0 bytes/sec
   ...

Second use ELAM Procedure to capture all the ARP packet hitting the module. You might need to do it several times. But if there is a storm going on, the chance you capture the violate ARP packet is much better than legetimate ARP packet. Identify the source MAC and Vlan from the ELAM capture.

Contributed by Cisco Engineers

Jun Wang
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Nexus 7000 Series Switches

Nexus 7000 Troubleshoot Address Resolution Protocol (ARP) Storm Without Inband Capture

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Background

Root Cause

Solution

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products