Introduction
This document describes specific scenario where Gateway-Gprs Support Node (GGSN) Call Data records(G-CDRs) are stuck due to wrong configuration in Access Point Name(APN) results in wrong billing for subscribers and Charging Gateway Function(CGF) receives backdated CDRs which are stuck in GGSN. This issue is reported in Cisco Aggregrated Service routers (ASR) 5x00 series.
Problem
Because of various reasons(Most probably misconfigurations) for some APNs , CDRs go to default group. In default group, we do not have CGF servers configured and hence the requests get stuck.
for example :
apn blackberry.net.40413pre
selection-mode subscribed sent-by-ms chosen-by-sgsn
accounting-mode none
timeout idle 10800
ip access-group ECS in
ip access-group ECS out
ip address pool name blackberry
credit-control-group GY_LIVE_PRE
active-charging rulebase test_prepaid
exit
apn blackberry.net.40443pre
selection-mode subscribed sent-by-ms chosen-by-sgsn
accounting-mode none
timeout idle 10800
ip access-group ECS in
ip access-group ECS out
ip address pool name blackberry
credit-control-group GY_LIVE_PRE
active-charging rulebase test_prepaid
exit
apn blackberry.net.40446pre
selection-mode subscribed sent-by-ms chosen-by-sgsn
accounting-mode none
timeout idle 10800
ip access-group ECS in
ip access-group ECS out
ip address pool name blackberry
credit-control-group GY_LIVE_PRE
active-charging rulebase test_prepaid
exit
apn blackberry.net.40484pre
selection-mode subscribed sent-by-ms chosen-by-sgsn
accounting-mode none
timeout idle 10800
ip access-group ECS in
ip access-group ECS out
ip address pool name blackberry
credit-control-group GY_LIVE_PRE
active-charging rulebase test_prepaid
exit
apn blackberry.net.40486pre
selection-mode subscribed sent-by-ms chosen-by-sgsn
accounting-mode none
timeout idle 10800
ip access-group ECS in
ip access-group ECS out
ip address pool name blackberry
credit-control-group GY_LIVE_PRE
active-charging rulebase test_prepaid
exit
aaa group default
#exit
gtpp group default
Troubleshoot
In Show support details output, check for the command output
******** show session subsystem data-info verbose *******
647274 Total gtpp acct requests 1 Current gtpp acct requests
0 Total gtpp acct cancelled 0 Total gtpp acct purged
0 Total gtpp sec acct requests 0 Total gtpp sec acct purged
248 Total null acct requests 0 Current null acct requests
2482018515 Total aaa acct sessions 265064 Current aaa acct sessions
14529031 Total aaa acct archived 6518761 Current aaa acct archived
265064 Current recovery archives 259073 Current valid recovery records
1108 Total aaa sockets opened 932 Current aaa sockets opened
Current aaa acct archived shows 6 million CDRs are stuck in all aaamgrs and due to which no new CDRs get processed and transferred to CGF in streaming mode.
Once the Limit is reached per aaamgr, CDRs are purged and results in loss of CDRs and revenue loss to customer.
out of 6 million CDRs archived , you see some CDRs being purged
******** show session subsystem data-info verbose *******
1228764750 Total gtpp charg 6534523 Current gtpp charg
1221919009 Total gtpp charg success 311218 Total gtpp charg failure
0 Total gtpp charg cancelled 311218 Total gtpp charg purged
0 Total gtpp sec charg 0 Total gtpp sec charg purged
0 Total prepaid online requests 0 Current prepaid online requests
0 Total prepaid online success 0 Current prepaid online failure
0 Total prepaid online retried 0 Total prepaid online cancelled
0 Current prepaid online purged
Here is the check lists of CLI commands commonly used to debug CDR related issues.
- show gtpp accounting servers
- show gtpp accounting servers group name <CGF>
- show gtpp counters all
- show gtpp counters cgf-address 172.16.10.11
- show gtpp counters cgf-address 172.16.10.11 gcdrs
- show gtpp counters group name CGF
- show gtpp counters group name CGF gcdrs
- show gtpp group all
- show gtpp group name CGF
- show gtpp statistics
- show gtpp statistics cgf-address 172.16.10.11
- show gtpp statistics group name CGF
- show gtpp storage-server streaming file counters all
- show gtpp storage-server streaming file counters group name CGF
- show gtpp storage-server streaming file statistics
- show gtpp storage-server streaming file statistics group name CGF
Solution
Method of Procedure(MOP) to clean up the CDRs that belong to Default group in aaaproxy process.
Step 1. Note down the archived CDRs. Show gtpp counters all
Step 2. Configure the mode to local in gaggsnctx config context gaggsnctx gtpp group default gtpp storage-server mode local
Step 3. Please kill aaaproxy using this command in hidden mode. task kill facility aaaproxy all. (Task kill will make the local mode to be applied to default group.)
Step 4. Come out of hidden mode
Step 5. Check show gtpp storage-server local file statistics is increasing.
Step 6. Run show gtpp counters all every 30 secs. This should come down to zero in a span of 5 minutes.
Step 7. Revert the mode to remote. config context gaggsnctx gtpp group default gtpp storage-server mode remote
Step 8. Check the archived counter (show gtpp counters all) is not increasing and show gtpp storage-server local file statistics is not increasing.
Step 9. Take the SSD and send back to us for verification to make sure that config is intact and all steps are followed.
Note: After completion of activity, if you know the procedure to remove CDR files from HDD. Go ahead. (if not, please engage the TAC engineer for this activity some other day)
If aaaproxy doesn’t recovery after 1 minute, refer the recovery procedure.
Procedure to recover of aaaproxy
a. Issue the command to check which controller takes care of aaaproxy task
show task table | grep aaaproxy
task Parent
cpu facility inst pid pri facility inst pid
---- --------------- -------- ------- ---- ---
4/0 aaaproxy 1 6721 0 sessctrl 0 10565
b. Please execute the below commands and look out for instance of sessctrl on Active SMC
#Show task table | grep sessctrl
Task parent
cpu facility inst pid pri facility inst pid
---- ------------------------------- --- ----------------------------
8/0 sessctrl 0 10565 -4 sitparent 80 2812
c. Issue the sessctrl instance kill command
Task kill facility sessctrl instance <>.
d. After the execution of command, wait for 30 secs and issue the commands to check state of sessctrl and aaaproxy
1. Show task table | grep "8/0 sessctrl"
2. Show task resources | grep aaaproxy
Technical explanation
Because of various reasons (most probably misconfigurations) for some APNs , CDRs go to default group. In default group, you do not have CGF servers configured and hence the requests get stuck. For the APNs for which there is a valid gtpp group configured , CDRs should not be archived but they may go to the archive queue.
From archive queue you can only process five requests at a time. In case if all five requests belong to the APNs which misconfiguration then top five requests are never freed thus blocking all CDRs behind the queue. This means the CDRs generated on specific month are stuck there and processed wrongly.
ASR5x00 has an upper limit how many CDRs can be archived. Once the limit is crossed the archived CDRs get purged. This makes way for the valid CDRs generated on a specific month and they get released.
For example,
If the queue has five requests and rest of the requests are belonging to the valid APN with correct config and when you process, every time the five requests never gets freed as there is no server configured and you are stuck forever as you process only five CDRs at a time. However if one of the requests gets purged, this means you have 4 requests belonging to the invalid config APN and next one is valid APN. Now when you process five requests the four requests are stuck but fifth one is processed now. In this way, you will see old CDRs sent to CGF like CGF would be process Dec month CDRs in January because they are released late by GGSN.
Why the CDRs for correct group are sent to archive queue: The max packet that can be transmitted in User Datagram Protocol (UDP) is 64K including the header. Now since we configured max-cdrs 255 wait-time 60, there is a chance 64 K buffer is full before max 255 CDRs is reached. System will check whether new CDR can fit into the 64K Buffer or not. If not system will put them back to the archive queue. This CDR put back to the archive queue stuck for one month till the CDRs for invalid group are purged. If there would have been correct configuration, then the archive queue never had the CDRs for those APNs which doesn’t have servers and this issue would have never seen because even if CDR enters into the archive queue it would have been processed.
Logic
You are killing aaaproxy and changing gtpp storage-server mode local, so the CDRs stuck are pushed to Local harddisk and will avoid purging of the CDRs once the limits are reached per aaamgr. Once all CDRs get written to local Harddisk , you can change back to remote mode which is default one.