vDRA Advanced Tuning

The configuration values in this document are based on the type and size of DRA deployment.

The deployment can be categorized into two sizes:

  • Small Deployment - 2 Directors and 4 workers

  • Large Deployment - 8 directors and 10 workers


Note


Recommended values are based on:

  • Local Latency:

    • 20 ms for small deployments

    • 25 ms on large deployments

  • Remote Latency:

    • 55 ms for small deployments

    • 50 ms for large deployments

In case the latency changes to a higher value, recommended values for all the configurations highlighted in this document needs to be re-characterized again.


Threading Configuration

Thread pool configurations are applicable to java processes that run-in diameter-endpoint-s1xx on Director VM and binding-s1xx container on Worker VMs in DRA.

For more information on Threading Configuration, refer to the CPS vDRA Configuration Guide.

Table 1. Thread Pool Names

Thread Pool Name

Description

bindings

This thread pool is used for IMSI_APN/MSISDN_APN AAR lookup (MOG) and binding audit updates.

bindings.lookup

This thread pool is used only if appsharding.reader thread pool is overloaded. This thread pool is applicable only for AAR VoLTE calls.

bindings.delete

This thread pool is used to query session database and enqueue session and all the binding for deletion.

bindings.response

This thread pool is used to receive database storage response and forwards it as bind store response. This is applicable for CCR-I requests.

msgtimeouts

This thread pool is used to process timed out requests.

qprocessor

This thread pool is used to send routed messages outside on the specific TCP connection.

receivers

Not Applicable

senders

Not Applicable

qprocessor.priority This thread pool is used to send only priority routed messages outside on the specific TCP connection.

localcpPublishers

This thread pool is used to publish messages to local control plane.

globalcpPublishers

This thread pool is used to publish messages to global control plane.

cpSubscriberWorker

This Worker thread pool is used to process all the incoming control plane messages from local sites.

globalCPSubscriberWorker

This Worker thread pool is used to process all the incoming control plane messages from remote sites.

pcrfQuery

This thread pool is used for PCRF session queries used only for Rx AAR VoLTE calls.

pcrf.restapi

This thread pool is used to send request and receive response to/from pcrf endpoints and used only for RX AAR VoLTE calls.

Note

 

Scale By CPU Core is not applicable for this thread pool.

stackStopHandler

This thread pool is used to gracefully disconnect diameter peers on process shutdown.

appsharding.reader

This thread pool is specific to Rx AAR VoLTE IPv6 binding lookup and processing. This thread pool is created for each shard.

The following tables lists recommended values when single DRA installation TPS is above 200 K.


Important


The highlighted values must be explicitly configured.


Table 2. Recommended Values

Thread Pool Name

Default Values

Recommended Values

Threads

Queue Size

Threads

Queue Size

bindings

30

1000

130

30000

bindings.

lookup

10

1000

30

5000

bindings.

delete

10

1000

100

50000

bindings.

response

10

1000

10

1000

msgtimeouts

3

1000

3

1000

qprocessor

30

5000

150

25000

localcpPublishers

15

2000

15

2000

globalcpPublishers

40

3000

40

3000

cpSubscriberWorker

15

10000

15

10000

globalCPSubscriberWorker

15

10000

30

20000

pcrfQuery

20

400

20

400

pcrf.restapi

10

2000

10

2000

stackStopHandler

50

0

50

0

appsharding.

reader

1

1000

2

2000

Database Capacity


Note


Rated database capacity has been arrived at after performing capacity tests.


The following table lists the database capacity for small and large deployments.

Table 3. Database Capacity

Deployment

Cluster

No. of Primary Shards (dual DB enabled)

Network Latency (in ms)

Database Capacity (in k)

Database Capacity Per Shard

85% Threshold Value

Small Deployment

Session_IPv6

8

20

30

30000/8 = 3750

3200

IMSI_MSISDN

4

55

15

15000/4 = 3750

3200

Large Deployment

Session_IPv6

48

25

180

180000/48 = 3750

3200

IMSI_MSISDN

48

50

145

145000/48 = 3020

2600

The following is the test call model used to determine the capacity for each cluster:

  1. CCR_I, CCR_T and AAR call model for Session-IPv6 database cluster capacity.

  2. CCR-I and CCR_T call model for IMSI-APN/MSISDN-APN database cluster capacity.

  3. Database capacity is based on network latency mentioned in the Table 1. If network latency is changed to higher value, then database capacity needs to be re-characterized.

Database Alert Expressions

Database alerts refer to alerts that can be configured to be triggered if the capacity of the database exceeds beyond a certain threshold.

IMSI_MSISDN Cluster

alert rule DRA_IMSI_MSISDN_DB_TPS_EXCEEDED

expression "sum(rate(mongo_operation_total{state='primary',type='mongo',op=~'update|query|delete',cluster='IMSI_MSISDN'}[5m])) > (2500 * sum (mongo_node_state_primary {cluster='IMSI_MSISDN',type='mongo'}))"

event-host-label instance

message "{{ $labels.instance }} Persistence DB TPS exceeded , current value is {{ $value }} !"

snmp-severity critical

snmp-clear-message "{{ $labels.instance }} Persistence DB TPS in control, current value is {{ $value }} !"

Session_IPv6 Cluster

alert rule DRA_SESS_IPV6_DB_TPS_EXCEEDED

expression "sum(rate(mongo_operation_total{state='primary',type='mongo',op=~'update|query|delete',cluster=~'SES_IPV6_.*'}[5m])) > (3200 * sum(mongo_node_state_primary{cluster=~'SES_IPV6_.*',type='mongo'}))"

event-host-label instance

message "{{ $labels.instance }} Persistence DB TPS exceeded , current value is {{ $value }} !"

snmp-severity critical

snmp-clear-message "{{ $labels.instance }} Persistence DB TPS in control, current value is {{ $value }} !"

Database Connection Settings

The following configurations vary based on deployment size (Small or Large) and should be configured accordingly. The worker VM java processes connect to the databases and create two database connection pools based on the database operation:

  • Create/Update, and Delete

  • Read

For more information on configuration syntax and examples, refer to the CPS vDRA Operations Guide.

binding db-connection-settings

The recommended values listed in the table are for write operations (Create/Update, Delete):


Important


The highlighted values must be explicitly configured.


Table 4. Recommended Value for Write Operations

Deployment

Bindings

connection-per-host

Default Values

Recommended Values

Small Deployment

drasession

10

14

ipv6

10

10

imsiapn

10

35

msisdnapn

10

10

ipv4

10

10

Large Deployment

drasession

10

10

ipv6

10

10

imsiapn

10

20

msisdnapn

10

20

ipv4

10

2

binding db-read-connection-settings

The recommended values listed in the table are for read operations:


Important


The highlighted values must be explicitly configured.


Table 5. Recommended Value for Read Operations

Deployment

Bindings

connections-per-host-for-read

Default Values

Recommended Values

Small Deployment

drasession

5

5

ipv6

5

5

imsiapn

5

3

msisdnapn

5

5

ipv4

5

5

Large Deployment

drasession

5

20

ipv6

5

20

imsiapn

5

10

msisdnapn

5

10

ipv4

5

1

binding cluster-binding-dbs imsiapn-msisdnapn

The following configuration is applicable only for large deployments. With this configuration IMSI-APN and MSISDN-APN bindings databases use the same connection pool for database transactions. This configuration helps in reducing the overall threads count in the worker VM.

Table 6. Recommended Values

Deployment

Recommended Value

Small Deployment

Disable

Large Deployment

Enable

Audit Rate Limiter

Database Audit is an important functionality in DRA. To ensure that the database does not fill up with unnecessary entries or expired entries are cleared, a house keeping process is always functional. The following parameters should be used based on the deployment size and rated capacity of the system.

Table 7. Audit Rate Limiter Recommended Values

Deployment

Rate Limiter

Stale Session Expiry Count

Binding DB Read Preference

Small Deployment

100

6

Nearest

Large Deployment

51

6

Nearest

Control Plane Tuning Configuration

Director nodes periodically advertise the status of all its peer connections over local and global control plane. All DRA nodes in the network use these peer status messages to keep their peer topology view updated. Peer topology is used to route the messages to appropriate director across different sites. When there are large number of peers connected across different sites, the load on control plane increases. This load increase can cause delay in processing peer status updates.

The following configurations are recommended to handle the increase in the control plane traffic process.

Table 8. Control Plane Traffic Tuning Commands

Command

Parameters

Default Values

Recommended Value

control-plane timers peer-status-update-interval <time-in-ms> peer-expiration-duration <duration-in-ms>

Note

 

To reflect the peer expiration duration change, application should be restarted in both director and worker nodes.

time-in-ms, duration-in-ms

2000, 10000

4000, 12000

control-plane remote-peer-policy mated-system id <system-id>

system-id (mated pair system ID)

NA

Mated Pair System ID

control-plane remote-peer-policy global accept diameter-applications [Application Type]

Application Type

All

Rx

control-plane ipc-endpoint update-interval <time-in-milliseconds>

<time-in-milliseconds>

100

1000

IPC Queue Send Thread Tuning Configuration

DRA maintains an IPC message queue where all the internal messages exchanged between Directors or Workers are stored. There are a set of IPC Queue processor threads that process these messages. During slow network conditions, when messages get delivered to one or a subset of peers, an IPC Queue processor thread that sends the message to these peers can eventually degrade the rate at which the IPC messages drop. These slow down conditions can cause message drops and 3002 errors not only to peers having degraded network but also to other peers.

The following configurations are recommended to handle the slow network peers.

Table 9. IPC Queue - send thread Tuning Commands

Deployment

Command

Parameters

Default Values

Recommended Value

Large

dra ipc-send-thread-limit <thread-limit> lock-sla-timeout <time-in-ms>message-throttle-duration <duration-in-ms> timeout-sample-to-throttle <max-samples>

thread-limit

time-in-ms

duration-in- ms

max-samples

50

200

30000

150

50 (1/3rd of IPC threads)

250

30000

150

Small

dra ipc-send-thread-limit <thread-limit> lock-sla-timeout <time-in-ms>message-throttle-duration <duration-in-ms> timeout-sample-to-throttle <max-samples>

thread-limit

time-in-ms

duration-in- ms

max-samples

50

200

30000

150

10 (1/3rd of IPC threads)

50

30000

30


Note


  • To disable the SLA timeout, configure lock-sla-timeout to 2000 ms. Disabling SLA timeout leads to ipc message drop in slow network.

  • To disable throttling, configure message-throttle-duration to 0.

    To disable thread throttling(limit), configure it to the value same as IPC Processor threads. Increasing timeout-sample-to-throttle reduces the chance of throttling the peer.


IPC Queue Send Thread Priority Tuning Configuration

The enhancement of the slow peer handling logic is applicable for priority and nonpriority messages. Once a slow peer is identified, the application must be notified. The further messages towards that peer will be throttled for the configured duration (message-throttle-duration). Once the configured time elapsed, the peer will be marked as normal peer and all the messages towards that peer will be processed.

The following configurations are recommended to handle the slow network peers with extra optional parameters.

Table 10. IPC Queue - send thread priority Tuning Commands

Deployment

Command

Parameters

Default Values

Recommended Value

Large

dra ipc-send-thread-priority limit <thread-limit> priority lock-sla-timeout <time-in-ms>priority message-throttle-duration <duration-in-ms>priority timeout-sample-to-throttle <max-samples>

thread-limit

time-in-ms

duration-in- ms

max-samples

5

200

30000

150

5 (1/2 of IPC threads)

250

30000

150

Small

dra ipc-send-thread-priority limit <thread-limit> priority lock-sla-timeout <time-in-ms> priority message-throttle-duration <duration-in-ms> priority timeout-sample-to-throttle <max-samples>

thread-limit

time-in-ms

duration-in- ms

max-samples

2

200

30000

150

2 (1/3rd of IPC threads)

50

30000

30