The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes how to troubleshoot the most common issues with the Border Gateway Protocol (BGP) and provides basic solutions and guidelines.
There are no specific prerequisites for this document. Basic BGP protocol knowledge is useful, you can refer to the BGP Configuration Guide for more information.
This document is not restricted to specific software and hardware versions, but commands are applicable for Cisco IOS® and Cisco IOS® XE.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
This document describes a basic guide to troubleshoot the most common issues in Border Gateway Protocol (BGP), gives corrective actions, useful commands/debugs to detect the root cause of the problems, and best practices to avoid potential issues. Keep in mind that all possible variables and scenarios cannot be considered and a deeper analysis could be required by Cisco TAC.
Use this topology diagram as a reference for the outputs provided in this document.
If a BGP session is down and does not come up, issue the show ip bgp all summary
command.
Here you can find the current status of the session:
R2#show ip bgp all summary For address family: IPv4 Unicast BGP router identifier 198.51.100.2, local AS number 65537 BGP table version is 19, main routing table version 19 18 network entries using 4464 bytes of memory 18 path entries using 2448 bytes of memory 1/1 BGP path/bestpath attribute entries using 296 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory BGP using 7208 total bytes of memory BGP activity 18/0 prefixes, 18/0 paths, scan interval 60 secs 18 networks peaked at 11:21:00 Jun 30 2022 CST (00:01:35.450 ago) Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.0.23.3 4 65537 6 5 19 0 0 00:01:34 18 198.51.100.1 4 65536 0 0 1 0 0 never Idle
The first requirement that has to be ensured is the connectivity between both peers so TCP session on port 179 can be established. Either they are directly connected or not. A simple ping is useful for this matter. If peering is established between loopback interfaces, a loopback to loopback ping must be done. If a ping test is performed without specific loopback as the source interface, the outgoing physical interface IP address is used as the packet’s source IP address instead of the router’s loopback IP address.
If ping is not successful, consider these causes:
show ip route peer_IP_address
can be used.If ping is successful, consider this:
show logging
command.%BGP-3-NOTIFICATION: sent to neighbor 198.51.100.1 passive 2/2 (peer in wrong AS) 2 bytes 1B39
Check BGP configuration on both ends to correct AS numbers or peer IP address.
%BGP-3-NOTIFICATION: sent to neighbor 198.51.100.1 passive 2/3 (BGP identifier wrong) 4 bytes 0A0A0A0A
Check BGP identifier on both ends via show ip bgp all summary
and correct the duplicate issue. This can be achieved manually with global command bgp router-id X.X.X.X
under bgp router configuration. As a best practice, ensure router ID is set manually to unique number.
Most of the iBGP sessions are configured over the loopback interfaces reachable via an IGP. This loopback interface must be explicitly defined as the source, Do this with the command neighbor ip-address update-source interface-id
.
For eBGP peer, directly connected interfaces are usually used for peering, and there is a check for Cisco IOS/Cisco IOS XE to fulfill this purpose, or it does not even try to establish session. If eBGP is tried from loopback to loopback on directly connected routers, this check can be disabled for a specific neighbor on both ends via neighbor ip-address disable-connected-check
.
However, if there are multiple hops between the eBGP peers, a proper hop count is required, ensure the neighbor ip-address ebgp-multihop [hop-count]
is configured with the correct hop count so session can be established.
If the hop-count is not specified, the default TTL value for iBGP sessions is 255, while the default TTL value for eBGP sessions is 1.
A useful action to test port 179 is a manual telnet from one peer to the other:
R1#telnet 198.51.100.2 179 Trying 198.51.100.2, 179 ... Open [Connection to 198.51.100.2 closed by foreign host]
Either open/connection closed, or connection refused by remote host indicates packets reach remote end, then, ensure there are no problems with control plane at far end. Otherwise, if there is a Destination unreachable, check any firewall or access lists which can block TCP port 179, or BGP packets, or any packet loss on the path.
In case of authentication problem, the messages you can see:
%TCP-6-BADAUTH: Invalid MD5 digest from 198.51.100.1(179) to 198.51.100.2(20062) tableid - 0 %TCP-6-BADAUTH: No MD5 digest from 198.51.100.1(179) to 198.51.100.2(20062) tableid - 0
Check authentication methods, password and related configuration, and to further troubleshoot refer to MD5 Authentication Between BGP Peers Configuration Example.
If the TCP session does not come up, you can use the next commands for isolation:
show tcp brief all
show control-plane host open-ports
debug ip tcp transactions
If session is up and down, look for show log
and you can see some scenarios.
%BGP-5-ADJCHANGE: neighbor 198.51.100.2 Down Interface flap
As message indicates, reason for this failure is the interface down situation, look for any physical issues on port/SFP, cable or disconnections.
%BGP-3-NOTIFICATION: sent to neighbor 198.51.100.2 4/0 (hold time expired) 0 bytes
It is a very common situation; it means that router did not receive or process a keepalive message or any update message before the hold timer expired. Device sends a notification message and closes the session. The most commons reasons for this issue are listed here:
show interface
can be used for this purpose. debug bgp [vrf name] ipv4 unicast keepalives
is useful.
show processes cpu [sorted|history]
is useful to identify problem. Based on the platform, you can find the next step to troubleshoot with the CPU Reference document show ip bgp neighbors ip_address
.A Ping test to a specific neighbor with df set can show you if such MTU is valid along the path:
ping 198.51.100.2 size max_seg_size df
If MTU issues are found, an accurate review of the configuration must be done to ensure that the MTU values are consistent throughout the network.
Note: For more information on MTU, refer to BGP Neighbor Flaps with MTU Troubleshooting .
%BGP-5-ADJCHANGE: neighbor 198.51.100.2 passive Down AFI/SAFI not supported
%BGP-3-NOTIFICATION: received from neighbor 198.51.100.2 active 2/8 (no supported AFI/SAFI) 3 bytes 000000
Address-Family Identifier (AFI) is a capability extension added by Multi-Protocol BGP (MP-BGP). It correlates to a specific network protocol, such as IPv4, IPv6, and the like, and additional granularity through a Subsequent Address-Family Identifier (SAFI), such as unicast and multicast. MBGP achieves this separation by BGP path attributes (PAs) MP_REACH_NLRI and MP_UNREACH_NLRI. These attributes are carried inside BGP update messages and are used to carry network reachability information for different address families.
The message gives you the numbers of these AFI/SAFI registered by IANA:
neighbor ip-address dont-capability-negotiate
on both ends. For further information, refer to Unsupported Capabilities Cause BGP Peer Malfunction.For a better explanation about how BGP works, and to select best path, refer to BGP Best Path Selection Algorithm.
For a route to be installed into our routing table, next hop needs to be reachable, otherwise, even if prefix is on our Loc-RIB BGP table, it does not get into RIB. As a loop avoidance rule, on Cisco IOS/Cisco IOS XE, iBGP does not change next hop attribute and leaves AS_PATH alone while eBGP rewrites next hop and prepends its AS_PATH.
You can check next hop with show ip bgp [prefix].
It gives you the next hop and inaccessible word. In the example, this is a prefix announced by R1 via eBGP to R2 and learnt by R3 via iBGP connection from R2.
R3#show ip bgp 192.0.2.1 BGP routing table entry for 192.0.2.1/32, version 0 Paths: (1 available, no best path) Not advertised to any peer Refresh Epoch 1 65536 198.51.100.1 (inaccessible) from 10.0.23.2 (10.2.2.2) Origin incomplete, metric 0, localpref 100, valid, internal rx pathid: 0, tx pathid: 0 Updated on Jul 1 2022 13:44:19 CST
On the output, next hop is the outgoing interface of R1 which is not known by R3. In order to fix this situation either you can advertise next-hop via IGP, static route or use the
neighbor ip-address next-hop-self
command on iBGP peer to modify the next-hop IP (which is directly connected). On diagram example, this configuration needs to be on R2; the neighbor towards R3 (neighbor 10.0.23.3 next-hop-self).
As a result, next hop changes (after a clear ip bgp 10.0.23.2 soft
) to directly connected interface (reachable) and prefix is installed.
R3#show ip bgp 192.0.2.1 BGP routing table entry for 192.0.2.1/32, version 24 Paths: (1 available, best #1, table default) Not advertised to any peer Refresh Epoch 1 65536 10.0.23.2 from 10.0.23.2 (10.2.2.2) Origin incomplete, metric 0, localpref 100, valid, internal, best rx pathid: 0, tx pathid: 0x0 Updated on Jul 1 2022 13:46:53 CST
This happens when route cannot be installed into the Global RIB, which results in a RIB failure. Common reason is when same prefix is already on RIB for another routing protocol with lower administrative distance, but the exact reason for a RIB failure is seen with the command show ip bgp rib-failure. For deeper explanation, you can consult this link:
Note: You can identify and correct such issue as explained in Understand BGP RIB-failure and The Command bgp suppress-inactive.
The most common issue seen is when IGP is preferred over eBGP on mutual redistribution scenario. When an IGP route is redistributed into BGP, it is considered locally generated by BGP and gets a weight of 32768 by default. All prefixes received from a BGP peer are assigned a local weight of 0 by default. Therefore, if the same prefix must be compared, the prefix with the higher weight is installed in the routing table based on the BGP best path selection process and this is why IGP route is installed on RIB.
The solution for this problem, is to set a higher weight for all routes received from the BGP peer under router bgp configuration:
neighbor ip-address weight 40000
Note: For a detailed explanation, refer to Understand the Importance of BGP Weight Path Attribute in Network Failover Scenarios.
It is a peer that cannot keep up with the rate at which the sender generates update messages. There are many reasons for a peer to exhibit this problem; high CPU in one of the peers, excess traffic or traffic loss on a link, bandwidth resource, among others.
Note: To help identify and correct slow peers issues, refer to Use the BGP "Slow Peer" Feature to Resolve Slow Peer Issues.
BGP uses memory that is assigned to the Cisco IOS process to maintain network prefixes, best paths, polices and all related configuration to operate properly. The overall processes are seen with command show processes memory sorted
:
R1#show processes memory sorted
Processor Pool Total: 2121414332 Used: 255911152 Free: 1865503180 reserve P Pool Total: 102404 Used: 88 Free: 102316 lsmpi_io Pool Total: 3149400 Used: 3148568 Free: 832 PID TTY Allocated Freed Holding Getbufs Retbufs Process 0 0 266231616 81418808 160053760 0 0 *Init* 662 0 34427640 51720 34751920 0 0 SBC main process 85 0 9463568 0 8982224 0 0 IOSD ipc task 0 0 34864888 25213216 8513400 8616279 0 *Dead* 504 0 696632 0 738576 0 0 QOS_MODULE_MAIN 518 0 940000 8616 613760 0 0 BGP Router 228 0 856064 345488 510080 0 0 mDNS 82 0 547096 118360 417520 0 0 SAMsgThread 0 0 0 0 395408 0 0 *MallocLite*
Processor pool is the memory used; around 2.1 GB in the example. Next, you must look at the Holding column to identify the sub-process holding most of it. Then, you need to check the BGP sessions you have, how many routes are received, and configuration used.
Common steps to reduce memory holding by BGP:
Note: For further information on how to optimize BGP refer to Configure BGP Routers for Optimal Performance and Reduced Memory Consumption.
Routers use different processes for BGP to operate. To verify the BGP process is the cause of high CPU utilization, use the show process cpu sorted
command.
R3#show processes cpu sorted CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 163 36 1463 24 0.07% 0.00% 0.00% 0 ADJ background 62 28 132 212 0.07% 0.00% 0.00% 0 Exec 2 39 294 132 0.00% 0.00% 0.00% 0 Load Meter 1 0 4 0 0.00% 0.00% 0.00% 0 Chunk Manager 3 27 1429 18 0.00% 0.00% 0.00% 0 BGP Scheduler 4 0 1 0 0.00% 0.00% 0.00% 0 RO Notify Timers 63 4 61 65 0.00% 0.00% 0.00% 0 BGP I/O 83 924 26 35538 0.00% 0.03% 0.04% 0 BGP Scanner 96 142 11651 12 0.00% 0.00% 0.00% 0 Tunnel BGP 7 0 1 0 0.00% 0.00% 0.00% 0 DiscardQ Backgro
Here are the common processes, causes, and general steps to overcome high CPU utilization due to BGP:
Note: For further information on how to troubleshoot these two processes, refer to Troubleshoot High CPU Caused by the BGP Scanner or Router Process.
Revision | Publish Date | Comments |
---|---|---|
3.0 |
25-Sep-2023 |
Updated IOS XE (removed dash) and added trademark, SEO and formatting. |
2.0 |
21-Feb-2023 |
Recertification. |
1.0 |
04-Aug-2022 |
Initial Release |