Guidelines and Limitations for VXLAN BGP EVPN
VXLAN BGP EVPN has the following guidelines and limitations:
-
Routing between VXLAN VLANs and non-VXLAN VLANs, and Layer 3 interfaces, is not supported on Cisco Nexus 3100-V platform switches. Hence, Cisco Nexus 3100-V platform switches cannot be a border leaf VTEP in a VXLAN EVPN setup.
-
You can configure EVPN over segment routing or MPLS. See the Cisco Nexus 3000 Series NX-OS Label Switching Configuration Guide for more information.
-
You can use MPLS tunnel encapsulation using the new CLI encapsulation mpls command. You can configure the label allocation mode for the EVPN address family. See the Cisco Nexus 3000 Series NX-OS Label Switching Configuration Guide for more information.
-
In a VXLAN EVPN setup that has a 2K VNI scale configuration, the control plane down time takes more than 200 seconds. To avoid BGP flap, configure the graceful restart time to 300 seconds.
-
SVI and sub-interfaces as core links are not supported in multisite EVPN.
-
In a VXLAN EVPN setup, border leaves must use unique route distinguishers, preferably using auto rd command. It is not supported to have same route distinguishers in different border leaves.
-
ARP suppression is only supported for a VNI if the VTEP hosts the First-Hop Gateway (Distributed Anycast Gateway) for this VNI. The VTEP and the SVI for this VLAN have to be properly configured for the distributed Anycast Gateway operation, for example, global Anycast Gateway MAC address configured and Anycast Gateway feature with the virtual IP address on the SVI.
-
When Layer 3 EVPN is configured in Cisco Nexus 3000 platform switches that are based on Broadcom ASIC and these switches are added in the topology with Layer 2 EVPN, the routing for this scenario is not supported. When you configure SVI and Layer 3 EVPN on Cisco Nexus 3000 platform switches based on Broadcom ASIC with Anycast Gateway and when you send the ARP requests from a Layer 2 EVPN device (for example, Cisco Nexus 3000 platform switches, based on a Broadcom ASIC), the Cisco Nexus 3000 platform switches can not be used as a gateway for the ARP requests received on the network ports.
-
The show commands with the internal keyword are not supported.
-
DHCP snooping (Dynamic Host Configuration Protocol snooping) is not supported on VXLAN VLANs.
-
SPAN TX for VXLAN encapsulated traffic is not supported for the Layer 3 uplink interface.
-
RACLs are not supported on Layer 3 uplinks for VXLAN traffic. Egress VACLs support is not available for de-capsulated packets in the network to access direction on the inner payload.
As a best practice, use PACLs/VACLs for the access to the network direction.
-
QoS classification is not supported for VXLAN traffic in the network to access direction on the Layer 3 uplink interface.
-
The QoS buffer-boost feature is not applicable for VXLAN traffic.
-
VTEP does not support Layer 3 subinterface uplinks that carry VXLAN encapsulated traffic.
-
Layer 3 interface uplinks that carry VXLAN encapsulated traffic do not support subinterfaces for non-VxLAN encapsulated traffic.
-
Non-VXLAN sub-interface VLANs cannot be shared with VXLAN VLANs.
-
Subinterfaces on 40G (ALE) uplink ports are not supported on VXLAN VTEPs.
-
Point to multipoint Layer 3 and SVI uplinks are not supported. Since both uplink types can only be enabled point-to-point, they cannot span across more than two switches.
-
For EBGP, it is recommended to use a single overlay EBGP EVPN session between loopbacks.
-
Bind NVE to a loopback address that is separate from other loopback addresses that are required by Layer 3 protocols. A best practice is to use a dedicated loopback address for VXLAN.
-
VXLAN BGP EVPN does not support an NVE interface in a non-default VRF.
-
It is recommended to configure a single BGP session over the loopback for an overlay BGP session.
-
The VXLAN UDP port number is used for VXLAN encapsulation. For Cisco Nexus NX-OS, the UDP port number is 4789. It complies with IETF standards and is not configurable.
-
VXLAN supports In Service Software Upgrade (ISSU).
-
VTEP connected to FEX host interface ports is not supported.
-
Resilient hashing (port-channel load-balancing resiliency) and VXLAN configurations are not compatible with VTEPs using ALE uplink ports.
Note
Resilient hashing is disabled by default.
Note |
For information about VXLAN BGP EVPN scalability, see the Verified Scalability Guide for your platform. |
Notes for EVPN Convergence
The following are notes about EVPN Convergence (7.0(3)I3(1) and later):
-
As a best practice, the NVE source loopback should be dedicated to NVE. so that NVE can bring the loopback up or down as needed.
-
When vPC has been configured, the loopback stays down until the MCT link comes up.
Note
When feature vpc is enabled and there is no VPC configured, the NVE source loopback is in "shutdown" state after an upgrade. In this case, removing feature vpc restores the interface to "up" state."
-
The NVE underlay (through the source loopback) is kept down until the overlay has converged.
-
When MCT comes up, the source loopback is kept down for an amount of time that is configurable. This approach prevents north-south traffic from coming in until the overlay has converged.
-
When MCT goes down, NVE is kept up for 30 seconds in the event that there is still south-north traffic from vPC legs which have not yet gone down.
-
-
BGP ignores routes from vPC peer. This reduces the number of routes in BGP.
Considerations for VXLAN BGP EVPN Deployment
-
A loopback address is required when using the source-interface config command. The loopback address represents the local VTEP IP.
-
During boot-up of a switch (7.0(3)I2(2) and later), you can use the source-interface hold-down-time hold-down-time command to suppress advertisement of the NVE loopback address until the overlay has converged. The range for the hold-down-time is 0 - 2147483647 seconds. The default is 300 seconds.
-
To establish IP multicast routing in the core, IP multicast configuration, PIM configuration, and RP configuration is required.
-
VTEP to VTEP unicast reachability can be configured through any IGP/BGP protocol.
-
If the anycast gateway feature is enabled for a specific VNI, then the anyway gateway feature must be enabled on all VTEPs that have that VNI configured. Having the anycast gateway feature configured on only some of the VTEPs enabled for a specific VNI is not supported.
-
It is a requirement when changing the primary or secondary IP address of the NVE source interfaces to shut the NVE interface before changing the IP address.
-
As a best practice, the RP for the multicast group should be configured only on the spine layer. Use the anycast RP for RP load balancing and redundancy.
-
Every tenant VRF needs a VRF overlay VLAN and SVI for VXLAN routing.
-
When configuring ARP suppression with BGP-EVPN, use the hardware access-list tcam region arp-ether size double-wide command to accommodate ARP in this region. (You must decrease the size of an existing TCAM region before using this command.)
VPC Considerations for VXLAN BGP EVPN Deployment
-
The loopback address used by NVE needs to be configured to have a primary IP address and a secondary IP address.
The secondary IP address is used for all VxLAN traffic that includes multicast and unicast encapsulated traffic.
-
Each VPC peer needs to have separate BGP sessions to the spine.
-
VPC peers must have identical configurations.
-
Consistent VLAN to VN-segment mapping.
-
Consistent NVE1 binding to the same loopback interface
-
Using the same secondary IP address.
-
Using different primary IP addresses.
-
-
Consistent VNI to group mapping.
-
The VRF overlay VLAN should be a member of the peer-link port-channel.
-
-
For multicast, the VPC node that receives the (S, G) join from the RP (rendezvous point) becomes the DF (designated forwarder). On the DF node, encap routes are installed for multicast.
Decap routes are installed based on the election of a decapper from between the VPC primary node and the VPC secondary node. The winner of the decap election is the node with the least cost to the RP. However, if the cost to the RP is the same for both nodes, the VPC primary node is elected.
The winner of the decap election has the decap mroute installed. The other node does not have a decap route installed.
-
On a VPC device, BUM traffic (broadcast, unknown-unicast, and multicast traffic) from hosts is replicated on the peer-link. A copy is made of every native packet and each native packet is sent across the peer-link to service orphan-ports connected to the peer VPC switch.
To prevent traffic loops in VXLAN networks, native packets ingressing the peer-link cannot be sent to an uplink. However, if the peer switch is the encapper, the copied packet traverses the peer-link and is sent to the uplink.
Note
Each copied packet is sent on a special internal VLAN (VLAN 4041).
-
When peer-link is shut, the loopback interface used by NVE on the VPC secondary is brought down and the status is Admin Shut. This is done so that the route to the loopback is withdrawn on the upstream and that the upstream can divert all traffic to the VPC primary.
Note
Orphans connected to the VPC secondary will experience loss of traffic for the period that the peer-link is shut. This is similar to Layer 2 orphans in a VPC secondary of a traditional VPC setup.
-
When peer-link is no-shut, the NVE loopback address is brought up again and the route is advertised upstream, attracting traffic.
-
For VPC, the loopback interface has 2 IP addresses: the primary IP address and the secondary IP address.
The primary IP address is unique and is used by Layer 3 protocols.
The secondary IP address on loopback is necessary because the interface NVE uses it for the VTEP IP address. The secondary IP address must be same on both vPC peers.
-
The VPC peer-gateway feature must be enabled on both peers.
As a best practice, use peer-switch, peer gateway, ip arp sync, ipv6 nd sync configurations for improved convergence in VPC topologies.
In addition, increase the STP hello timer to 4 seconds to avoid unnecessary TCN generations when VPC role changes occur.
The following is an example (best practice) of a VPC configuration:
switch# sh ru vpc version 6.1(2)I3(1) feature vpc vpc domain 2 peer-switch peer-keepalive destination 172.29.206.65 source 172.29.206.64 peer-gateway ipv6 nd synchronize ip arp synchronize
-
On a VPC pair, shutting down NVE or NVE loopback on one of the VPC nodes is not a supported configuration. This means that traffic failover on one-side NVE shut or one-side loopback shut is not supported.
-
Redundant anycast RPs configured in the network for multicast load-balancing and RP redundancy are supported on VPC VTEP topologies.
-
Enabling vpc peer-gateway configuration is mandatory. For peer-gateway functionality, at least one backup routing SVI is required to be enabled across peer-link and also configured with PIM. This provides a backup routing path in the case when VTEP loses complete connectivity to the spine. Remote peer reachability is re-routed over the peer-link in this case.
The following is an example of SVI with PIM enabled:
swithch# sh ru int vlan 2 interface Vlan2 description special_svi_over_peer-link no shutdown ip address 30.2.1.1/30 ip pim sparse-mode
Note
The SVI must be configured on both VPC peers and requires PIM to be enabled.
-
As a best practice when changing the secondary IP address of an anycast VPC VTEP, the NVE interfaces on both the VPC primary and the VPC secondary should be shut before the IP changes are made.
-
To provide redundancy and failover of VXLAN traffic when a VTEP loses all of its uplinks to the spine, it is recommended to run a Layer 3 link or an SVI link over the peer-link between VPC peers.
-
If DHCP Relay is required in VRF for DHCP clients or if loopback in VRF is required for reachability test on a VPC pair, it is necessary to create a backup SVI per VRF with PIM enabled.
swithch# sh ru int vlan 20 interface Vlan20 description backup routing svi for VRF Green vrf member GREEN no shutdown ip address 30.2.10.1/30
Network Considerations for VXLAN Deployments
-
MTU Size in the Transport Network
Due to the MAC-to-UDP encapsulation, VXLAN introduces 50-byte overhead to the original frames. Therefore, the maximum transmission unit (MTU) in the transport network needs to be increased by 50 bytes. If the overlays use a 1500-byte MTU, the transport network needs to be configured to accommodate 1550-byte packets at a minimum. Jumbo-frame support in the transport network is required if the overlay applications tend to use larger frame sizes than 1500 bytes.
-
ECMP and LACP Hashing Algorithms in the Transport Network
As described in a previous section, Cisco Nexus 3000 Series Switches introduce a level of entropy in the source UDP port for ECMP and LACP hashing in the transport network. As a way to augment this implementation, the transport network uses an ECMP or LACP hashing algorithm that takes the UDP source port as an input for hashing, which achieves the best load-sharing results for VXLAN encapsulated traffic.
-
Multicast Group Scaling
The VXLAN implementation on Cisco Nexus 3000 Series Switches uses multicast tunnels for broadcast, unknown unicast, and multicast traffic forwarding. Ideally, one VXLAN segment mapping to one IP multicast group is the way to provide the optimal multicast forwarding. It is possible, however, to have multiple VXLAN segments share a single IP multicast group in the core network. VXLAN can support up to 16 million logical Layer 2 segments, using the 24-bit VNID field in the header. With one-to-one mapping between VXLAN segments and IP multicast groups, an increase in the number of VXLAN segments causes a parallel increase in the required multicast address space and the amount of forwarding states on the core network devices. At some point, multicast scalability in the transport network can become a concern. In this case, mapping multiple VXLAN segments to a single multicast group can help conserve multicast control plane resources on the core devices and achieve the desired VXLAN scalability. However, this mapping comes at the cost of suboptimal multicast forwarding. Packets forwarded to the multicast group for one tenant are now sent to the VTEPs of other tenants that are sharing the same multicast group. This causes inefficient utilization of multicast data plane resources. Therefore, this solution is a trade-off between control plane scalability and data plane efficiency.
Despite the suboptimal multicast replication and forwarding, having multiple-tenant VXLAN networks to share a multicast group does not bring any implications to the Layer 2 isolation between the tenant networks. After receiving an encapsulated packet from the multicast group, a VTEP checks and validates the VNID in the VXLAN header of the packet. The VTEP discards the packet if the VNID is unknown to it. Only when the VNID matches one of the VTEP’s local VXLAN VNIDs, does it forward the packet to that VXLAN segment. Other tenant networks will not receive the packet. Thus, the segregation between VXLAN segments is not compromised.
Considerations for the Transport Network
The following are considerations for the configuration of the transport network:
-
On the VTEP device:
-
Enable and configure IP multicast.*
-
Create and configure a loopback interface with a /32 IP address.
(For vPC VTEPs, you must configure primary and secondary /32 IP addresses.)
-
Enable IP multicast on the loopback interface.*
-
Advertise the loopback interface /32 addresses through the routing protocol (static route) that runs in the transport network.
-
Enable IP multicast on the uplink outgoing physical interface.*
-
-
Throughout the transport network:
-
Enable and configure IP multicast.*
-
-
When using SVI uplinks with VXLAN enabled on Cisco Nexus 9200 and 9300-EX platform switches, use the system nve infra-vlans command to specify the VLANs that are used for uplink SVI. Failing to specify the VLANs results in traffic loss.
Note
-
The system nve infra-vlans command specifies VLANs used by all SVI interfaces for uplink and vPC peer-links in VXLAN as infra-VLANs.
-
You should not configure certain combinations of infra-VLANs. For example, 2 and 514, 10 and 522, which are 512 apart.
-
Note |
* Not required for static ingress replication or BGP EVPN ingress replication. |
BGP EVPN Considerations for VXLAN Deployment
Commands for BGP EVPN
The following describes commands to support BGP EVPN VXLAN control planes.
Command |
Description |
||
---|---|---|---|
member vni range [associate-vrf] |
Associate VXLAN VNIs (Virtual Network Identifiers) with the NVE interface The attribute associate- vrf is used to identify and separate processing VNIs that are associated with a VRF and used for routing.
|
||
show nve vni show nve vni summary |
Displays information that determine if the VNI is configured for peer and host learning via the control plane or data plane. |
||
show bgp l2vpn evpn show bgp l2vpn evpn summary |
Displays the Layer 2 VPN EVPN address family. |
||
host-reachability protocol bgp |
Specifies BGP as the mechanism for host reachability advertisement. |
||
suppress-arp |
Suppresses ARP under Layer 2 VNI. |
||
fabric forwarding anycast-gateway-mac |
Configures anycast gateway MAC of the switch. |
||
vrf context |
Creates the VRF and enter the VRF mode. |
||
nv overlay evpn |
Enables/Disables the Ethernet VPN (EVPN). |
||
router bgp |
Configures the Border Gateway Protocol (BGP). |
||
suppress mac-route |
Suppresses the BGP MAC route so that BGP only sends the MAC/IP route for a host. Under NVE, the MAC updates for all VNIs are suppressed.
|