The SIP registration failover for Cisco Jabber applies if you deploy Expressway with Mobile and Remote Access (MRA).
Expressway X12.7 and later versions build on existing failover capabilities for clustered Expressways with a few MRA failover
updates that improve substantially the failover time for Cisco Jabber clients that connect over MRA. Among the updates include
adaptive routing, STUN keepalive support, and improved error reporting.
Note
|
The registration failover feature uses STUN messages sent between the Unified CM and Expressway-C. This feature uses the same
SIP connections along which SIP signaling messages traverse. In order to prevent filtering or removal of these STUN messages,
disable SIP inspection on any firewall or Application Layer Gateway (ALG) device between Unified CM and Expressway C.
|
These new capabilities will allow Jabber clients to support MRA High Availability (failover) for voice and video.
Unified CM is able to resolve automatically added Expressway-C hostname
Unified CM does not respond to STUN requests when Expressway-C sends out STUN Keepalives on MRA SIP session.
Expressway-C nodes automatically add into Unified CM (under
) through the AXL API with the Expressway-C hostname (not FQDN) when the Unified CM is configured on Expressway-C for MRA
solution.
Every 30 seconds, Expressway-C initiates MRA SIP session keepalive to Unified CM.
Before responding to the received keepalive, Unified CM tries to resolve the hostname of Expressway-C. If Unified CM fails
to resolve through DNS it does not respond to STUN keepalive requests. This flaps the MRA SIP registration.
If Unified CM and Expressway-C are in different domains, make sure that the Unified CM can resolve the hostname of Expressway-C.
Adaptive routing
Adaptive routing updates in Expressway X12.7 and later versions allows Expressway to alter the routing path dynamically. If
a node failure is detected, packets are rerouted to a peer node that is up and running. For example, assume that a remote
Jabber client sends a SIP REGISTER that is intended to be routed through a specific Expressway-E (EXWY-E1), Expressway-C (EXWY-C1)
and Unified CM (CUCM1) combination, but the designated Expressway-C node is either down or is in maintenance mode. In this
case, the message is rerouted to a peer Expressway-C node (EXWY-C2) and then on to the intended Unified CM destination. After
the registration, Cisco Jabber also updates its routing table so that future SIP messages use the registration path.
Note
|
|
STUN keepalive support
In addition to adaptive routing, Expressway X12.7 and later versions support the use of STUN keepalives by MRA connected Jabber
clients. Remote Jabber clients send STUN keepalives into the enterprise network via Expressway-E to learn of connection issues
ahead of time. As a result, if a node in the registration path fails, Jabber will learn of the failure after receiving the
STUN response and can select a different route path for future SIP messages.
Settings
The STUN keepalive setting is configurable in the Advanced section of the
page. See Advanced Settings.
Field
|
Description
|
STUN keepalive
|
Enable STUN keepalive for Unified CM High Availability.
Default: On
|
Requirements
No specific configuration is required (subject of course to the necessary clustering/backup nodes existing). However, you
must be running the following minimum releases:
Routing Feature
|
Minimum Releases Required
|
Adaptive routing
|
-
Expressway X12.7
-
Cisco Jabber 12.9 MR
-
Cisco Webex App
|
STUN keepalives
|
-
Expressway X12.7
-
Cisco Unified Communications Manager 14
-
Cisco Jabber 12.9 MR
-
Cisco Webex App
|
Note
|
-
STUN keepalive is sent every 30 seconds from the client (Jabber) and if it didn’t get the response within 3 seconds, then
the client initiates failover.
-
When Expressway is configured with a different domain from Unified CM, the Unified CM admin needs to update Expressway-C Hostname
entry manually to FQDN, by appending the relevant system domain of Expressway-C.
|
Load Balance After Node Recovery
With MRA-HA, whenever there is a node(s) failure the load of the failed node(s) will be shifted to the other available nodes
in the cluster. The following sections describe the load balancing procedure after the node(s) became active in the cluster.
Load Balance Expressway-C nodes
From X14.1 release, Expressway-C node load balancing is achieved by using Adaptive Routing on Expressway-E node.
After an Expressway-C node failure, the traffic/registrations will be handled by other nodes in the cluster. Once the failure
node gets recovered and becomes active, even though new registrations go through that node, the existing load won’t be handled
by that node. To load balance the Expressway-C cluster in that scenario, we are introducing AR mechanism on Expressway-E.
There is a keep alive mechanism between Expressway-E nodes and Expressway-C nodes, in a mesh architecture. Within the keep
alive message, Expressway-C sends resource usage/active registrations to Expressway-E. Then, Expressway-E evaluates the active
registrations across all the nodes in Expressway-C and if it identifies an unbalanced load on the node, it triggers load balancing.
The load balancing is achieved by adaptively routing the Register messages (New/Refresh) to least loaded node. This will be
done to the clients which supports Adaptive Routing. Once the load is balanced Expressway-E will stop the process. This ensures
no node is idle and load is balanced.
Load Balance Expressway-E nodes
Expressway-E node maintains the count of total number of registrations of all nodes in the cluster. Whenever there is an imbalance
in the cluster, the node with high number of registrations will respond to register messages with a warning header in the
200-response message, indicating load is imbalanced.
Note
|
The load balance will not be shared equally or in a fixed ratio but will try to avoid the 0-100 share situation for a node.
|
Benefits with all software requirements
When all three components - clients, Expressway, Unified CM - are running updated software with MRA registration failover
capabilities, the following benefits apply:
-
No user action required for failover
-
Faster failover times - down to 30-60 seconds from the previous standard of 120 seconds
-
Route path updates dynamically to handle server failures
-
More routes are available to reach the intended destination
-
Remote Jabber clients can learn of server failures via STUN keepalives and adjust routing ahead of time
Adaptive routing benefit without Unified CM upgrade
Even without new Unified CM software (but with new Expressway and Jabber software), this feature has the benefit of allowing
Jabber clients to detect path failures.
Note
|
This action will take over 2 minutes, and Expressway may flag Unified CM servers as inactive in some scenarios where actually
the server is just idle or has low use at the time.
|