Introduction
This document describes the procedure to troubleshoot non-termination of PPPoE sessions after a subscription change in Cisco Policy Suite (CPS) over Radius protocol.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Linux
- CPS
- Radius Protocol
Cisco recommends that you must have privilege access:
- root access to CPS CLI
- "qns-svn" user access to CPS GUIs (Policy builder and Control Center)
Components used
The information in this document is based on these software and hardware versions:
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Background Information
CPS is designed to work as an Authentication, Authorization, and Accounting (AAA) Server/Client model, to support Point-to-Point Protocol over Ethernet (PPPoE) Subscribers. CPS interacts with ASR9K or ASR1K Devices to manage PPPoE Sessions.
Problem
PPPoE sessions don't disconnect and reconnect after a new subscription selection in CPS via a Simple Object Access Protocol (SOAP) Application Programming Interface (API) request from an external provisioning system.
The observation is, CPS is able to generate the Change of Action (COA) request and send it to the ASR9K device, but those requests get time out by the ASR9K device with "No response Timeout Error".
Here is the sample error message:
dc1-lb01 dc1-lb01 2021-09-28 21:26:13,331 [pool-2-thread-1] ERROR c.b.p.r.jms.PolicyActionJmsReceiver - Error executing RemoteAction. Returning Error Message response
com.broadhop.exception.BroadhopException: Timeout: No Response from RADIUS Server
at com.broadhop.radius.impl.actions.AsynchCoARequest.execute(AsynchCoARequest.java:213) ~[com.broadhop.radius.service_13.0.1.r150127.jar:na]
at com.broadhop.utilities.policy.remote.RemoteActionStub.execute(RemoteActionStub.java:62) ~[com.broadhop.utility_13.0.0.release.jar:na]
at com.broadhop.policy.remote.jms.PolicyActionJmsReceiver$RemoteActionExecutor.run(PolicyActionJmsReceiver.java:98) ~[com.broadhop.policy.remote.jms_13.0.0.release.jar:na]
at com.broadhop.utilities.policy.async.PolicyRemoteAsyncActionRunnable.run(PolicyRemoteAsyncActionRunnable.java:24) [com.broadhop.utility_13.0.0.release.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_72]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_72]
at com.broadhop.utilities.policy.async.AsyncPolicyActionExecutionManager$GenericThead.run(AsyncPolicyActionExecutionManager.java:301) [com.broadhop.utility_13.0.0.release.jar:na]
Caused by: net.jradius.exception.TimeoutException: Timeout: No Response from RADIUS Server
at net.jradius.client.RadiusClientTransport.sendReceive(RadiusClientTransport.java:112) ~[na:na]
at net.jradius.client.RadiusClient.changeOfAuth(RadiusClient.java:383) ~[na:na]
at com.broadhop.radius.impl.actions.AsynchCoARequest.execute(AsynchCoARequest.java:205) ~[com.broadhop.radius.service_13.0.1.r150127.jar:na]
... 6 common frames omitted
Issue Reproduction Steps
Step 1. Initiate PPPoE sessions from ASR9K or ASR1K devices, ensure you see those sessions in CPS via Control Center.
Step 2. Initiate a SOAP API request to update the Subscription of Services associated with the subscriber.
Step 3. CPS starts COA requests towards ASR9K or ASR1K. You can observe that CPS performs retry of the same req with the duplicate request of the same COA.
Note: The first packet gets unacknowledged by the peer device (ASR9K), hence the internal logic in CPS triggers a retry mechanism and sends duplicate requests.
Step 4. The observation is, CPS drops all other Session update action, as there is no response for the first Session COA request and its retries.
With this, you can see the PPPoE session is still active at ASR9K, and no session disconnect request was sent towards CPS for the session refresh. CPS expects an Accounting Stop request from ASR9K in regards to COA Request.
Main Points to be Noted with Respect to COA and its Retires
- CPS initiates COA requests for all the sessions Active/Exist in its database for a particular subscriber.
- If CPS doesn't receive ACK or NACK for a particular COA request, it initiates a retry mechanism based on the configuration in the policy builder.
- The Number of retries and duration between retries is configurable.
Sample Retry Configuration
Solution
In order to solve this issue, you need to extend further analyse towards ASR9K, and need to find out the reason for no response back to CPS for the COA request and its retries.
You can see in the sniffer traces that the Load Balancer (LB01) of CPS sources COA from <IP-1> and routes the packets over eth1, which is the default route.
The other Load Balancer (LB02) sources COA from <IP-2>, and it takes a specific route via eth2.
ASR9K has the Access List (ACL) to accept the COA only if it comes from <IP-2>, not from <IP-1>.
So you need to correct the route table at LB01 of CPS to send the COA with the proper Source IP, that is <IP-2> via a specific route.
Here you can see the successful end to end RADIUS transaction for a Subscription change, post necessary correction at CPS LB route table.