Introduction
This document describes how to troubleshoot eBGP (External Border Gateway Protocol) when the session is stuck in active state due to incorrect LPTS (Local Packet Transport Services) entries.
Contributed by William Xu, Cisco TAC Engineer.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
Components Used
The information in this document is based on ASR9000 (Aggregation Services Router) platforms.
The information in this document was created from devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any commands.
Problem
When you configure eBGP, the session can be stuck in active indefinitely if:
- There is no update-source command configured
- There is a topology change which causes traffic to take a different path
These symptoms present when this issue occurs:
- IP addresses are reachable
- Both BGP peers remain stuck in active
- Packet capture shows that the routers send many TCP resets
- show tcp trace error indicates this error for BGP sessions.
Feb 18 09:32:15.393 tcp/error 0/RSP0/CPU0 t9 Lpts set the drop flag for 179 -> 5368, drop packet (pak 0xb1cf80f3) and send a RST
In summary, the root cause of the issue is that LPTS entries are not updated by the routing and forwarding change. It means they remain in a stale state after the topology changes.
There are some enhancements done for BGP. These two scenarios cover more detail about this issue.
Note: iBGP (Internal Border Gateway Protocol) normally does not hit this issue since update-source is always used.
Scenario 1 - Multihop EBGP with Topology Change
You can build a multihop eBGP sessions between ASR9K-1 and ASR9K-3. The peer IP addresses are 172.123.1.1 and 172.123.2.2 at the physical interfaces. There is no update-source command configured. With the current topology, the session stays in the active state. This is expected because both routers will use the interface in subnet 172.123.3.0/24 as the egress interface.
You can shut down the direct link between ASR9K-1 and ASR9K-3. Then, the peer addresses are reachable via ASR9K-2 which is the multihop link, thus ping is successful. The source IP addresses match at both ends, but the BGP session is still in an active state.
When the BGP neighbors are configured, LPTS entries are created according to the CEF (Cisco Express Forwarding) table. For ASR9K-1, IP address 172.123.2.2 is reachable via 172.123.3.0/24 subnet. Therefore, the relevant entries in LPTS are available. It allows BGP neighbor to connect port 179 with local IP address 172.123.3.1. Since it tries to initiate a TCP session from local port 26036, you can see another entry for it.
ASR9K-1:
========
ASR9K-1#show lpts ifib entry brief | inc "BGP"
...
BGP4 default TCP any 0/RSP1/CPU0 172.123.3.1,179 172.123.2.2
BGP4 default TCP any 0/RSP1/CPU0 172.123.3.1,26036 172.123.2.2,179
This output is same in the ASR9K-3.
ASR9K-3:
========
ASR9K-3#show lpts ifib entry brief | inc "BGP"
...
BGP4 default TCP any 0/RSP1/CPU0 172.123.3.2,11126 172.123.1.1,179
BGP4 default TCP any 0/RSP1/CPU0 172.123.3.2,179 172.123.1.1
When the link between ASR9K-1 and ASR9K-3 goes down, the peers are reachable via ASR9K-2 path with a new local source IP address. But the topology change does not trigger the LPTS update. The original entry with port 179 stays with the original local IP address. This prevents the router to allow ingress TCP requests to the new local IP address. Hence, the BGP session at both ends remains stuck in an active state.
Scenario 2 - eBGP with Update Source Address Change
You can deploy an eBGP session between ASR9K-1 and ASR9K-3. The IP addresses are 172.123.3.1 and 172.123.3.2. As per the new plan, you changed the IP addresses to 172.123.3.111 and 172.123.3.222. If you configure eBGP first and then update the IP addresses at the interfaces, the EBGP session is stuck in an active state.
The cause is same as the scenario 1. Once you configure the eBGP session, the LPTS entries are generated according to the local egress interface at that point.
ASR9K-1:
========
ASR9K-1#show lpts ifib entry brief | inc "BGP"
...
BGP4 default TCP any 0/RSP1/CPU0 172.123.3.1,179 172.123.3.222
BGP4 default TCP any 0/RSP1/CPU0 172.123.3.1,24067 172.123.3.222,179
ASR9K-3:
========
ASR9K-3#show lpts ifib entry brief | inc "BGP"
...
BGP4 default TCP any 0/RSP1/CPU0 172.123.3.2,45091 172.123.3.111,179
BGP4 default TCP any 0/RSP1/CPU0 172.123.3.2,179 172.123.3.111
Although the local IP addresses were changed later, the LPTS entries are not updated. The TCP request is blocked and the session remains stuck in an active state forever.
Solution
To solve this issue, you need to trigger an update to LPTS. You can use these options to resolve the issue:
- Shut/No shut the BGP neighbors
- Reconfiguration of the BGP neighbors
- Restart process bgp
- Configure update-source at both ends which can prevent this issue.
Enhancement in XR Release
There are some enhancements in recent IOS XR releases.
CSCuz51103 - BGP session stuck in active
This enhancement introduced from XR release 6.1.1. In this release, when BGP tries to re-establish the session, LPTS updates its entries with the new local IP address . The update time depends on the hold time configuration at both ends. You can still wait for sometimes to see the session up.
Even with this enhancement, a BGP session still can be stuck in an active state if you have configured passive mode. The reason is obvious. If BGP does not try to re-establish the session, the local IP address is not checked. Hence the LPTS entries are not updated.
There is another enhancement for this situation from XR release 6.2.1.
CSCvb15128- BGP session stuck in active while router has Passive BGP mode configured
Related Information