System Usage of MGW Keepalive Parameters


Revised: July 2010, OL-23033-01

Introduction

This document explains how the Cisco BTS 10200 Softswitch determines the connectivity status between itself and a media gateway (MGW). The BTS 10200 executes a keepalive (KA) process that includes the transmission of audit-endpoint (AUEP) messages to MGCP, TGCP, and NCS based MGWs.

This document also describes a special set of provisionable parameters that you can adjust if there are network bandwidth or reliability issues, or if a MGW is slow in responding to commands from the Call Agent.

Provisionable Parameters

The following tokens are involved in KA process:

In the mgw-profile table:

keepalive-method (default = AUEP)

mgcp-keepalive-retries (default = 3)

mgcp-max1-retries (default = 2)

mgcp-max2-retries (default = 3)

mgcp-keepalive-interval (default = 60 seconds)

mgcp-max-keepalive-interval (default = 600 seconds)

mgcp-t-tran (default = 400 milliseconds)

term-seize-unreach (default = N)

target-disconnect-timer (default = 60 seconds)


Note The default value of target-disconnect-timer is 60 seconds for fresh installations of Release 6.0 software. For systems being upgraded to Release 6.0, the default is 20 seconds. For details, see the "MGCP-MAX2-RETRIES, MGCP-T-MAX and TARGET-DISCONNECT-TIMER" section.


keepalive-fail-release-timer (default = 36,000 seconds)

In the call-agent table:

mgw-monitoring-enabled (default=Y)

In the ca-config table:

mgcp-t-max (default = 20 seconds)

mgcp-t-hist (default = 30 seconds)

mgcp-rto-max (default = 4 seconds)

mgcp-high-and-wet-interval (default = 5 minutes)

mgcp-high-and-wet-retries (default = 3)

mgcp-high-and-dry-interval (default = 60 minutes)

mgcp-max-keepalive-auep (default = 20,000)

The system behavior described in this document assumes that all of the tokens in the above list are set to their default values. This has the following effect:

Enabling the KA process—Using the default values of mgw-monitoring-enabled (Y) and keepalive-method (AUEP) enables the sending of AUEP messages from the BTS 10200 to the MGW.


Note We recommend that you keep these tokens set to their default values (which enables the KA process) unless you have some other method of determining MGW connectivity status.


Controlling the impact of unreachable status—If you set term-seize-unreach to Y, the BTS 10200 attempts to set up calls to the MGW even if it has declared the MGW unreachable. This is useful if the MGW is able to receive calls, but the BTS 10200 is not scheduled to send an AUEP message to the MGW for several more minutes or hours. If you leave term-seize-unreach at its default value of N, the BTS 10200 does not attempt to set up calls if it has declared the MGW unreachable.


Caution For mgw-profile tables applicable to TGWs, Cisco strongly recommends leaving the term-seize-unreach parameter at its default value (N) if you have included multiple TGWs in a single trunk group. Otherwise, the BTS 10200 might repeatedly attempt to route calls to an unreachable TGW, even when reachable TGWs are available in the same trunk group.

Tuning the KA process—The other mgw-profile tokens in the above list have numerical values, and the default values typically work well for most systems. The interaction among these tokens is described in this document. We do not recommend that you modify these values unless you experience problems with bandwidth, reliability, or response times in your network.


Caution Before modifying any of these numerical values (using values different than the factory default settings), thoroughly read and understand the contents of this document. If you have questions, contact your Cisco account team or Cisco TAC.

Definitions and Additional Parameters

This section defines some of the terms used to describe the KA process.

KA procedure—A series of transmissions and retransmissions of AUEP messages used to determine the MGCP connectivity status between the BTS 10200 and the MGW. If there is a loss of connectivity, the BTS 10200 retries the series of AUEP transmissions up to a specified number of retries (defined by mgcp-keepalive-retries) before declaring the MGW to be in down status.


Tip The status mgw command can display the status of a MGW as working status or down status; the status mgw_tab command can display the status of a MGW as REACHABLE or UNREACHABLE. Working status is equivalent to REACHABLE, and down status is equivalent to UNREACHABLE.


KA attempt (AUEP transaction)—The transmission of an AUEP message, with a defined number of retransmissions of the same AUEP message, until an ACK message is received or the defined number of retransmissions has been reached. This can include retransmissions to additional IP addresses if provisioned to do so. A KA attempt is defined as successful if an ACK is received from the MGW. The KA attempt is defined as failed if no ACK is received after the defined number of AUEP retransmissions and within a defined waiting period (target-disconnect-timer).

KA retries—Subsequent KA attempts, sent only if the initial KA attempt fails.

MGW in working status—A KA attempt to this MGW is successful (an ACK is received).

MGW in down status—After a defined number of KA retry failures, the BTS 10200 declares this MGW to be in down status, and continues to perform the KA procedure.


Note To learn more about the MGW operational states reported by the BTS 10200, see the MGW Status Command in the BTS 10200 Operations and Maintenance Guide.


The system uses the parameters mgcp-max1-retries and mgcp-max2-retries to limit the number of retransmissions of the same AUEP. You can adjust the parameters mgcp-max1-retries and mgcp-max2-retries, if necessary, to improve response if there are network bandwidth or reliability issues, or if a MGW is slow in responding to commands from the Call Agent. These parameters are based on the description in RFC 3435, Section 4.3, and are defined as follows:

mgcp-max1-retries = Number of AUEP retransmissions sent to a single IP address before selecting the next IP address for this MGW listed in the DNS server (default = 2).

mgcp-max2-retries = Number of AUEP retransmissions sent to the last IP address for this MGW listed in the DNS server (default = 3).

The following parameters are also related to MGW connection status, but are not discussed in detail in this appendix. For additional details, see the Cisco BTS 10200 Softswitch CLI Database.

Long transaction (applicable to NCS endpoints)—The mgcp-t-longtran field in the mgw-profile table specifies the initial MGCP transaction timeout (in seconds) after receiving a provisional response (return code 100) from the MGW. The range is 1-10 (default = 5).

Return code action—The endpoint action (ep-action) field in the Media Gateway Control Protocol Return Code Action (mgcp-retcode-action) table specifies what action to take when an MGCP message is received from a MGW.

mgcp-max-keepalive-auep (default = 20,000)—This field specifies maximum number of MGWs to be AUEP pinged in any 10 second interval.

Querying Status of MGWs and Subscribers With Tabular Display

You can use the following commands to run queries for unreachable MGWs, or unreachable subscriber terminations. The system provides a tabular display of the status for MGWs or subscribers.

All Records with oper_state=UNREACHABLE

status mgw_tab oper_state=UNREACHABLE;

For this command:

For the oper_state, you can enter REACHABLE, UNREACHABLE, or UNKNOWN to define your query.

A maximum of 1,000 records can be displayed at one time.

Limited Range of Records with oper_state=UNREACHABLE

status mgw_tab oper_state=UNREACHABLE; limit=1000; start_row=0;

For this command:

You can specify the maximum number of rows to display (limit) and the first row that you want displayed (start_row).

Set the maximum number of records that you want to be displayed (limit) to any number 0-1,000.

The start_row value can be set to 0-N, where N is the maximum number of records generated by the query.

Unreachable Subscribers

Use the following command to query an unreachable subscriber.

status subscriber_termination_tab oper_state=UNREACH; limit=1000; start_row=0;

status subscriber_termination_tab call_state=BUSY; limit=1000; start_row=0;

For this command:

You must enter either the oper_state or the call_state. You cannot enter both.

For the oper_state, you can enter UNKNOWN, ACTIVE, MTRANS, DOWN, FAULTY, UNREACH, or OFF_NORMAL to define your query. OFF_NORMAL is the equivalent of UNREACH, FAULTY, or DOWN.

For the gateway call_state you can enter IDLE, BUSY, CTRANS_BUSY.

Narrow the command with the limit parameter if the results are greater than 1,000 records, which is the maximum number of records the system can display at one time.

The start_row value can be set to 0-N, where N is the maximum number of records that exist as a result of the query.

Output to File

You can send the output of these commands to a file, and there is no restriction on the number of records that can be sent. You do this by adding output=<filename>; output_type=xxx to the command, where xxx can be CSV or XML. CSV = comma-separated values. The following are examples.

status mgw_tab oper_state=UNREACHABLE; limit=1000; start_row=0; output=<filename>; 
output_type=CSV;

status subscriber_termination_tab call_state=BUSY; limit=1000; start_row=0; 
output=<filename>; output_type=CSV;

Examples of Successful MGCP Message Transmissions

This section illustrates several scenarios with successful MGCP message transmissions.


Note This section describes how the system sends AUEP messages. This discussion can be applied to the sending of any MGCP messages (including AUEP).


Initial Transmission Waiting Period (mgcp-t-tran)

The initial transmission waiting period (the period that the system waits after sending an initial AUEP transmission before repeating it) is equal to the greater of the following:

The average response time between the sending of an MGCP message, and receiving a response.

A specified lower limit, provisioned as mgcp-t-tran (default 400 milliseconds) in the mgw-profile table.

In a typical network, the average response time is much less than 400 milliseconds. Therefore, in this section, the drawings show the initial waiting period as mgcp-t-tran.


Note The drawings in this section are not to scale.


Scenarios With AUEP Message Retransmissions and ACK Received

Figure B-1 is applicable when one IP address is provisioned for the MGW in the DNS server. It illustrates the scenario in which the initial AUEP transmission does not receive an ACK message, but an ACK is received in response to retransmissions.

Figure B-1 One IP Address in DNS, ACK Received after Retransmissions

Note for Figure B-1

See the additional information about mgcp-t-tran in the "Initial Transmission Waiting Period (mgcp-t-tran)" section.

Figure B-2 is applicable when two IP addresses are provisioned for the MGW in the DNS server. It illustrates the scenario in which the initial AUEP transmission does not receive an ACK message, but an ACK is received in response to retransmissions.

Figure B-2 Two IP Addresses in DNS, ACK Received after Retransmissions

Note for Figure B-2

The system attempts up to four different IP addresses for any single MGW listed in the DNS server. (This maximum number of IP addresses is not provisionable.)

Scenarios with AUEP Message Retransmissions and No ACK

This section explains how the system handles MGCP message retransmissions and takes action when no ACK is received from the MGW.

MGCP-RTO-MAX

Figure B-3 shows how the system repeats the same AUEP message if an ACK is not received. The period between subsequent retransmissions increases by a factor of two, but is limited to a maximum of mgcp-rto-max.


Tip Terminology—Note that the first transmission is called the initial transmission. The second transmission of the same AUEP message is called the first retransmission (or the first retransmission).


Figure B-3 Example of Retransmission Timing (Upper Limit = mgcp-rto-max)

Note for Figure B-3

See the additional information about mgcp-t-tran in the "Initial Transmission Waiting Period (mgcp-t-tran)" section.

MGCP-MAX2-RETRIES, MGCP-T-MAX and TARGET-DISCONNECT-TIMER

The BTS 10200 limits retransmissions of the AUEP message to mgcp-max2-retries or a total duration of mgcp-t-max (default = 20 seconds), whichever occurs first.

If the BTS 10200 does not receive an ACK response from the MGW before the expiration of target-disconnect-timer (default value 60 seconds), the BTS 10200 abandons the transaction and takes one of the following additional actions:

If keepalive functionality is enabled on the system (mgw-monitoring-enabled=Y and keepalive-method=AUEP as described in the "Provisionable Parameters" section), the system expedites the KA process (as described in the "Keepalive Process" section) without immediately declaring the termination to be unreachable. However, if the KA process also fails, the system declares the MGW and all associated terminations to be unreachable. The operational status of each of the terminations is marked as UNREACH.

If keepalive functionality is disabled on the system, there will be no expedited keepalive process, and the system marks the operational status of the affected terminations as UNREACH.

Figure B-4 shows how the system handles AUEP retransmissions and takes action if no ACK is received. In this example, mgcp-max2-keepalive attempts are completed before the expiration of mgcp-t-max. In general, the system stops retransmissions when mgcp-max2-keepalive attempts are completed or at the expiration of mgcp-t-max, whichever occurs first. The system waits for an ACK until the target-disconnect-timer expires. When that timer expires, the KA attempt ends.

Figure B-4 AUEP Retransmissions and No ACK Received

Note for Figure B-4

For fresh installations of Release 6.0 and later software, the default value for target-disconnect-timer is 60 seconds, as shown in Figure B-4. However, if you have upgraded to Release 6.0 from a previous release, the default value was automatically set to 20 seconds during the upgrade process. If you have upgraded, and you previously customized your keepalive interval, you can adjust the value of target-disconnect-timer as needed to obtain the desired keepalive interval.

Keepalive Process

This section describes the keepalive (KA) process. The Cisco BTS 10200 performs KA attempts after periods of inactivity to determine whether the status of a MGW should be considered working or down. (A period of inactivity means a time period in which no MGCP message of any kind is received from the MGW.)


Note The following sections refer to two types of MGWs, residential gateways (RGWs) and trunking gateways (TGWs). A MGW is identified as either a RGW or TGW according to the provisioning of the type token in the mgw table (type=rgw or type=tgw).


The following scenarios are covered in this section:

Scenario 1—MGW Reachable

Scenario 2—MGW Unreachable

Scenario 3—MGW Previously Reachable but MGCP Message Fails

Scenario 4—MGW Previously Unreachable but MGCP Message Succeeds

Scenario 1—MGW Reachable

This scenario is shown in Figure B-5. The MGW is reachable (an ACK was received during a previous KA attempt), but no MGCP messages have been exchanged between the BTS 10200 and the MGW for a time interval equal to the provisioned value of mgcp-keepalive-interval (default 60 seconds). The BTS 10200 starts a new KA attempt.


Note The legend in this drawing explains how successful keepalive attempts are illustrated throughout this document.


Figure B-5 KA Attempts—MGW Reachable

Notes for Figure B-5

See the note regarding the default value of target-disconnect-timer below Figure B-4.

For detailed examples of successful KA attempts, see Figure B-1 and Figure B-2.

The system takes the following actions in the scenario shown in Figure B-5:

For an RGW—If KA attempts are successful, and stable calls exist, the system continues this pattern of KA attempts. If all the stable calls are finished and there are no more stable calls on the RGW, the system changes the waiting period between KA attempts from mgcp-keepalive-interval to mgcp-max-keepalive-interval (default = 600 seconds).

For a TGW—If KA attempts are successful, the system continues this pattern of KA attempts, regardless of whether there are stable calls on the TGW.

If a KA attempt is unsuccessful, the system changes the KA pattern to that shown in Scenario 2—MGW Unreachable.

Scenario 2—MGW Unreachable

In this scenario, the BTS 10200 performs KA attempts but does not receive an ACK response from the MGW. After the BTS 10200 attempts the number of KA retries provisioned for the parameter mgcp-keepalive-retries, it declares the MGW to be unreachable. The BTS 10200 continues to perform KA attempts; the pattern of KA attempts differs between TGWs and RGWs as described in this section.

KA Retries for TGW

The system continues to repeat closely-spaced KA attempts to the TGW, even after the number of attempts exceeds mgcp-keepalive-retries. This is shown in Figure B-6.


Note The legend in this drawing explains how unsuccessful keepalive attempts are illustrated throughout this document.


Figure B-6 KA Attempts—TGW Unreachable

Note for Figure B-6

See the note regarding the default value of target-disconnect-timer below Figure B-4.

KA Retries for RGW

The system continues to repeat KA attempts to the RGW. After the number of attempts exceeds mgcp-keepalive-retries, the system waits for a provisioned amount of time between successive attempts. This is shown in Figure B-7.

Figure B-7 KA Attempts—RGW Unreachable

Note for Figure B-7

See the note regarding the default value of target-disconnect-timer below Figure B-4.

Scenario 3—MGW Previously Reachable but MGCP Message Fails

In this scenario. the MGW was previously reachable (an ACK was received during a previous KA attempt). However, when the BTS 10200 sends an MGCP message to the MGW, the message fails; that is, the BTS 10200 does not receive a reply from the MGW within the allowed timeout period. Subsequent KA attempts fail. The pattern of KA attempts following the failed MGCP message differs between TGWs and RGWs as described below.

KA Retries for TGW

The system performs KA attempts to the TGW as shown in Figure B-8.

Figure B-8 KA Attempts—TGW Previously Reachable but MGCP Message Fails

KA Retries for RGW

The system performs KA attempts to the RGW as shown in Figure B-9.

Figure B-9 KA Attempts—RGW Previously Reachable but MGCP Message Fails

Scenario 4—MGW Previously Unreachable but MGCP Message Succeeds

In this scenario. the MGW was previously unreachable (an ACK was not received during previous KA attempts). However, the BTS 10200 receives an MGCP message in one of the following manners:

The BTS 10200 receives a valid MGCP message from the remote MGW.

The BTS 10200 sends an MGCP message to the MGW and the message succeeds; that is, the BTS 10200 receives a reply from the MGW within the allowed timeout period.

Subsequent KA attempts succeed, and the system declares the MGW to be reachable. The pattern of KA attempts following the failed MGCP message differs between TGWs and RGWs as described below.

KA Retries for TGW

The system performs KA attempts to the TGW as shown in Figure B-10.

Figure B-10 KA Attempts—TGW Previously Unreachable but MGCP Message Succeeds

KA Retries for RGW

The system performs KA attempts to the RGW as shown in Figure B-11.

Figure B-11 KA Attempts—RGW Previously Unreachable but MGCP Message Succeeds

Note for Figure B-11

For detailed examples of successful KA attempts, see Figure B-1 and Figure B-2.

Events and Alarms Related to the KA Process

Typical alarms for the KA process include:

SIGNALING (36)—Trunk locally blocked (applicable to CAS, ISDN, and SS7 trunks).

SIGNALING (79)—Trunking Gateway unreachable.

SIGNALING (171)—Residential Gateway unreachable.

Typical informational events for the KA process include:

SIGNALING (152)—Termination transient error received. This event is applicable when the keepalive functionality is enabled.

SIGNALING (76)—Timeout on remote instance. This event is applicable when the keepalive functionality is disabled.

You can check the status of a MGW or subscriber termination with the status commands listed in the "Managing External Resources" chapter in the Cisco BTS 10200 Softswitch Operations and Maintenance Guide, or with the tabular status commands listed in the "Querying Status of MGWs and Subscribers With Tabular Display" section. To query events and alarms, see the "Managing Events and Alarms" section in the Cisco BTS 10200 Softswitch Troubleshooting Guide.