Introduction
This document describes next steps for remediation of ACI fault F3274: fabric-encap-mismatch.
Background Information
This specific fault is checks for encap VLAN VNID mismatches between VPC peer nodes as this is always a problem.
For Non-VPC nodes, your expected re-usage of VLANs throughout an ACI fabric ultimately determines if a VNID mismatch across non-VPC nodes for a given VLAN can result in datapath/loop issues. Non-VPC node VNID validation is outside the scope of fault F3274.
"Code" : "F3274",
"Description" : "VNID mismatch between peers detected for encap vlans (<vlanId>).",
"Dn" : "topology/pod-1/node-<leafNodeId>/sys/vpc/inst/dom-<domainId>/if-<ifId>/fault-F3274",
Intersight Connected ACI Fabrics
This fault is actively monitored as part of Proactive ACI Engagements.
If you have an Intersight connected ACI fabric, a Service Request was generated on your behalf to indicate that instances of this fault were found within your Intersight Connected ACI fabric.
Quick Start Video
Video: Address ACI Fault Code F3274: fabric-encap-mismatch
Quick Start to Address Fault
- Copy the ACI Pre-Upgrade Validation Script onto an APIC within the ACI Fabric where this fault was flagged
- Run the script
- Look for the "Overlapping VLAN Pools" check results to identify which EPGs were found to have multiple domains related to distinct but overlapped VLAN Pools
- Based on the output, plan for an outage window1 to address the multiple domains related to distinct but overlapped VLAN pools config on each Identified EPG
- At the time of the outage window, update the Access Policies associated with the overlapped VLAN pools config on highlighted EPGs. Corrected config can be achieved by any of these approaches:
- Approach 1 - Each identified EPG with Domains each related to a distinct VLAN Pool with a distinct set of VLANs, overlaps removed
- Approach 2 - Each identified EPG with Domains that have converged to VLAN Pools that have converged on non-overlapped VLANs
- Approach 3 - Each identified EPG with associated Domains converged to a single Domain with all required VLANs
- If Access Policy correction results in a switch no longer having a reference to a given VLAN Pool, then the VLAN is automatically redeployed with a new Fabric Encap from remaining VLAN pools. A brief outage occurs when the VLAN is reprogrammed. Otherwise, the VLAN must be manually redeployed2 to allocate a new VXLAN ID.
1A brief outage occurs when the VLAN is redeployed.
2A VLAN is reprogrammed when the VLAN declaration config is redeployed;a static port binding, vmm domain assignment, or AEP EPG binding. This results in an outage until the VLAN has completed redeployment. If a large number of bindings need to be reprogrammed on a given leaf node, all VLANs can be reprogrammed with a clean reload of the switch(es) in question after Access Policies correction. A clean reload is performed when you issue an "acidiag touch clean" prior to a reload.
ACI Pre-upgrade Validation Script Example
When the ACI Pre-Upgrade Validation Script is run on an APIC, identified EPGs are flagged under the "Overlapping VLAN Pools" check:
Example Output:
[Check 29/36] Overlapping VLAN Pools... FAIL - OUTAGE WARNING!!
Tenant AP EPG VLAN Pool (Domain) 1 VLAN Pool (Domain) 2
------ -- --- -------------------- --------------------
MY_T AP1 EPG1-1 VLAN_POOL_1 (DOM_1) VLAN_POOL_2 (DOM_2)
Reference Document: "Overlapping VLAN Pool" from from Pre-Upgrade Check Lists
[Check 30/37] VNID Mismatch... FAIL - OUTAGE WARNING!!
EPG Access Encap Node ID Fabric Encap
--- ------------ ------- ------------
uni/tn-MY_T/ap-AP1/epg-EPG1-1 vlan-768 101 vxlan-8660
uni/tn-MY_T/ap-AP1/epg-EPG1-1 vlan-768 103 vxlan-8492
Recommended Action: Remove any domains with overlapping VLAN Pools from above EPGs, then redeploy VLAN
Reference Document: "Overlapping VLAN Pool" from Pre-Upgrade Check Lists
Given the example output, EPG EPG1-1 must have both DOM_1 and DOM_2 domains evaluated to identify which VLAN blocks within VLAN_POOL_1 and VLAN_POOL_2 contain overlap and why.
Additional Details
Please note that this specific condition is documented within the Cisco APIC Installation and ACI Upgrade and Downgrade Guide: Overlapping VLAN Pool section.
As the condition outlined from this fault can lead to datapath issues post-upgrade, logic to identify overlapped VLAN pools already exists within the ACI Pre-Upgrade Validation Script which is currently available on github.
Issues Induced by fabric-encap-mismatch
An ACI fabric with fabric-encap-mismatches derived from an overlap of VLAN blocks can result in:
These issues might not manifest until after an upgrade or clean reload of the affected switches. Leaf switches fetch policy from the APICs after an upgrade or clean-reload and may or may not not apply the same VLAN ID from the same pool that was used prior. As a result, the VLAN ID can get mapped to a different VXLAN VNID compared to other switch nodes. Remediation of this issue removes the uncertainty involved in a VLAN reprogram event.
Future Prevention
It is critical to ensure that there are no overlapped VLAN pools in your fabric unless it is an intentional design choice to re-use VLANs for distinct customers. This type of design requires additional config considerations not outlined in this document. If unsure, consider the "Enforce EPG VLAN Validation" setting under within the APIC GUI. Available with release 3.2(6) and up, this setting prevents the most common problematic configuration: two domains with overlapped VLAN pools associated to the same EPG.
Related Information
These documents contain additional information on Overlapped VLAN pools, why its an issue and how this scenario occurs: