The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes how the Expressway clusters are designed to extend the resilience and capacity of an Expressway installation.
Capacity. Expressway cluster can increase the capacity of an Expressway deployment by a maximum factor of four, compared with a single Expressway. Expressway peers in a cluster share bandwidth usage as well as routing, zone, FindMe and other configuration.
Resilience. Expressway cluster can provide redundancy while an Expressway is in maintenance mode, or in case it becomes inaccessible due to a network or power outage, or other reason. Endpoints can register to any of the peers in a cluster. If endpoints lose connection to their initial peer, they can re-register to another one in the cluster.
An Expressway can be part of a cluster of up to six Expressways. When you create a cluster you nominate one peer as the primary, from which its configuration is replicated to the other peers. Every Expressway peer in the cluster must have the same routing capabilities, if any Expressway can route a call to a destination it is assumed that all Expressway peers in that cluster can route a call to that destination.
There is no capacity gain after four peers. So in a six-peer cluster for example, the fifth and sixth Expressways do not add extra call capacity to the cluster. Resilience is improved with the extra peers, but not capacity.
All other license keys must be identical on each peer.
Note: If Expressway-E uses single Network Interface Controller (NIC), then it has to use public IP. If Expressway-E uses dual NIC, then internal interface has to be used to build the cluster.
Note: You must create a cluster of one (primary) peer first, and restart the primary, before you add other peers. You can add more peers after you have established a cluster of one.
Configuration primary: 1
Cluster IP version: Choose IPv4 or IPv6 to match with the network address scheme.
TLS verification mode Options: Permissive (default) or Enforce.
Permissive means that the peers do not validate each others' certificates when the intra-cluster Transport Layer Securoty (TLS) connections are stablished.
Enforce is more secure, but requires that each peer has a valid certificate and that the Certificate Authority (CA) is trusted by all other peers.
Peer 1 address: Enter the address of this Expressway (the primary peer). If TLS verification mode is set to Enforce, then you must enter a Fully Qualified Domain Name (FQDN) that matches the subject Common Name (CN) or a Subject Alternative Name (SAN) on this peer's certificate.
In order to add an additional peer, follow the next steps:
Caution: Before you proceed, verify that your certificate SANs contain the FQDNs that are in the Peer N address fields. You must see green status messages for clustering and certificate next to each address field before you proceed.
Caution: A warning displays if any certificates are invalid and prevents the cluster to work properly in enforced TLS verification mode.
Note: You can do this process even if the current primary peer is not accessible.
Note: While this process is performed, ignore any alarms on Expressway that report Cluster primary mismatch or Cluster replication error.
Note: While this procedure is performed, communications between peers are temporarily impacted, this means that is expected to see alarms that persist until the changes are complete and the cluster agrees on the new addresses.
For secure deployments like Mobile and Remote Access (MRA), each Expressway-E peer must have a certificate with a SAN that contains its public FQDN. The FQDN is mapped in the public DNS to the Expressway-E's public IP address.
Note: If you simply want to cluster Cisco Expressway-E peers and you don't need TLS verification between them, then you can form the cluster with the nodes' private IP addresses. You don't need cluster Address Mapping.
Cluster Address Mappings are FQDN:IP pairs which are shared around the cluster, one pair for each peer. The peers consult the Mapping Table before they query DNS and, if they find a match, they do not query DNS.
If you choose to enforce TLS, the peers must also read the names from the SAN field of each other's certificates, and check each name against the FQDN side of the mapping.
It is strongly recommended that you enter the mappings on the primary peer. Address Mappings replicate dynamically through the cluster. In order to configure Adderss Mapping, follow the next procedure:
Caution: Do not try to use the public DNS to map the peers' public FQDNs to their private IP addresses, this action can break external connectivity.
If you want the Expressway-E peers in a cluster to verify each other's identities with certificates, you could allow them to use DNS to resolve cluster peer FQDNs to their public IP addresses. This is a perfectly acceptable way to form a cluster if the Expressway-E nodes have:
If you clear all the peer address fields from the clustering page and save the configuration, Expressway by default performs a Factory Reset itself the next time you do a restart. This means that all configuration is deleted, except basic netowrk configuration for the Local Area Network 1 (LAN1) interface, that includes all configuration performed after you clear the fields and the next restart.
Tip: If you need to avoid the factory reset, restore the cluster peer address fields. Replace the original peer addresses in the same order, and then save the configuration to clear the banner.
The factory reset is automatically triggered when the peer restarts, to remove sensitive data and cluster configuration. The reset clears all configuration except the next basic network information:
Note: If you use the dual NIC option, be aware that any LAN2 configuration is removed completely by the reset.
Note: From version X12.6 the factory reset removes the server certificate, associated private key, and CA trust store settings from the peer. In earlier Expressway software versions these settings are preserved.
Factory Reset can fail, this can happen if the Expressway is a fresh install Open Virtualization Appliance (OVA), and haven't been upgraded.
In order to fix this please follow any of the next options:
Note: Make sure to take proper backups before an Upgrade, Certificate change, or when there is a Factory Reset warning.
If a restart of the cluster or any peer is needed, follow the next steps:
Note: You may need to wait about 5 minutes after make any cluste changes before the Expressway peers report successful status.
The alarms of cluster erors, are shown in the format: Cluster replication error: (details) manual synchronization of configuration is required, some examples of these are the next:
If a subordinate Expressway reports the alarm mentioned, follow the next procedure:
Note: Make sure to take proper backups before an Upgrade, Certificate change, or when there is a Factory Reset warning.
If the issue persist it could be related to the encryption key per cluster peer. Usually occurs when peers are upgraded in the wrong order, subordinates peers are not synchronized with the primary. So if xcommand forceconfigupdate doesn't work, follow the next procedure:
The replication alarm clears after the primary peer has upgraded and rebooted. This normally happens within ten minutes after reboot, but could be up to twenty minutes after reboot.
Invalid clustering configuration: H.323 mode must be turned On - clustering uses H.323 communications between peers.
For this alarm to be cleared ensure that H.323 mode is on, navigate to Configuration > Protocols > H.323.
Expressway database failure: Please contact your Cisco support representative.
In order to troubleshoot this kind of alarm, follow the next procedure:
A second method is possible if the database does not recover:
Note: Make sure to take proper backups before an Upgrade, Certificate change, or when there is a Factory Reset warning.
Caution: clusterdb_destroy_and_purge_data.sh is as dangerous as it sounds — use this option as last resort.
Note: The next information applies for version X14 onwards.
Failed to update key file alarms are raised on Expressways on a single node scenario.
Follow the next procedure to troubleshoot this kind of alarm:
Failed to update key file alarms are raised on Expressways on a cluster scenario.
Follow the next procedure to troubleshoot this kind of alarm:
Like any other log on Expressway, you can enable diagnostics logs, with TCP Dumps.
In a normal state DB Synchronization on Master node is shown in the logs as the next output:
2020-07-21T15:16:50.321-05:00 expc01 replication: UTCTime="2020-07-21 20:16:50,321" Module="developer.replication" Level="INFO" CodeLocation="clusterconfigurationsynchroniser(270)" Detail="Starting synchronisation"
2020-07-21T15:16:50.330-05:00 expc01 replication: UTCTime="2020-07-21 20:16:50,330" Module="developer.replication" Level="INFO" CodeLocation="clusterconfigurationutils(750)" AlternateIPAddresses="[u'(10.15.13.15 expc01)', u'(10.15.13.16 expc02)']" ConfigurationMasterIndex="0" LocalPeerIndex="0"
2020-07-21T15:16:50.433-05:00 expc01 replication: UTCTime="2020-07-21 20:16:50,433" Module="developer.replication" Level="INFO" CodeLocation="clusterconfigurationsynchroniser(257)" Detail="This peer is the cluster master, local configuration has already been replicated to the other peers"
2020-07-21T15:16:50.437-05:00 expc01 replication: UTCTime="2020-07-21 20:16:50,437" Module="developer.replication" Level="INFO" CodeLocation="clusterconfigurationsynchroniser(336)" Detail="Synchronisation completed successfully"
From Peer node perspective it is shown as the next output:
2020-07-21T15:16:46.900-05:00 expc02 replication: UTCTime="2020-07-21 20:16:46,899" Module="developer.replication" Level="INFO" CodeLocation="clusterconfigurationsynchroniser(270)" Detail="Starting synchronisation"
2020-07-21T15:16:46.908-05:00 expc02 replication: UTCTime="2020-07-21 20:16:46,908" Module="developer.replication" Level="INFO" CodeLocation="clusterconfigurationutils(750)" AlternateIPAddresses="[u'(10.15.13.15 expc01)', u'(10.15.13.16 expc02)']" ConfigurationMasterIndex="0" LocalPeerIndex="1"
2020-07-21T15:16:46.947-05:00 expc02 replication: UTCTime="2020-07-21 20:16:46,946" Module="developer.replication" Level="INFO" CodeLocation="clusterconfigurationsynchroniser(254)" Detail="This peer is not the cluster master, local configuration is already up to date"
2020-07-21T15:16:46.950-05:00 expc02 replication: UTCTime="2020-07-21 20:16:46,950" Module="developer.replication" Level="INFO" CodeLocation="clusterconfigurationsynchroniser(336)" Detail="Synchronisation completed successfully"
A Peer Disconnection is shown in the next output:
2020-08-12T14:57:43.353-05:00 expc01 UTCTime="2020-08-12 19:57:43,353" Module="developer.clusterdb.cdb" Level="INFO" Node="clusterdb@expc01.apolo.local" PID="<0.159.0>" Detail="Processed mnesia_down event from accessible node" Node="clusterdb@expc02.apolo.local"
2020-08-12T14:57:43.354-05:00 expc01 UTCTime="2020-08-12 19:57:43,353" Module="developer.clusterdb.cdb" Level="ERROR" Node="clusterdb@expc01.apolo.local" PID="<0.159.0>" Detail="Inconsistent Database" Context="from mnesia system - mnesia down" Node="clusterdb@expc02.apolo.local"
2020-08-12T14:57:43.354-05:00 expc01 UTCTime="2020-08-12 19:57:43,354" Module="developer.clusterdb.cdb" Level="INFO" Node="clusterdb@expc01.apolo.local" PID="<0.159.0>" Detail="Connecting database on mnesia running_partitioned_network event" Node="clusterdb@expc02.apolo.local"
2020-08-12T14:57:43.354-05:00 expc01 UTCTime="2020-08-12 19:57:43,354" Module="developer.clusterdb.cdb" Level="INFO" Node="clusterdb@expc01.apolo.local" PID="<0.14215.425>" Detail="Ready to perform node connection transaction" Node="clusterdb@expc02.apolo.local"
2020-08-12T14:57:43.354-05:00 expc01 UTCTime="2020-08-12 19:57:43,354" Module="developer.clusterdb.cdb" Level="INFO" Node="clusterdb@expc01.apolo.local" PID="<0.14215.425>" Detail="Running node connection transaction" Node="clusterdb@expc02.apolo.local"
2020-08-12T14:57:43.354-05:00 expc01 UTCTime="2020-08-12 19:57:43,354" Module="developer.clusterdb.synchronise" Level="WARN" Node="clusterdb@expc01.apolo.local" PID="<0.14215.425>" Detail="Failed connecting to node" Node="clusterdb@expc02.apolo.local" Reason="{ badrpc, { EXIT, { aborted, { noproc, { gen_server, call, [ kernel_safe_sup, { start_child, { dets_sup, { dets_sup, start_link, }, permanent, 1000, supervisor, [ dets_sup ] } }, infinity ] } } } } }"
2020-08-12T14:57:43.524-05:00 expc01 alarm: Level="WARN" Event="Alarm Raised" Id="20006" UUID="0f96695e-d954-4f6f-85c1-2ef1eae6f764" Severity="warning" Detail="Cluster database communication failure: The database is unable to replicate with one or more of the cluster peers" UTCTime="2020-08-12 19:57:43,524"
2020-08-12T14:57:43.771-05:00 expc01 alarm: Level="WARN" Event="Alarm Raised" Id="20004" UUID="3bca6888-f622-11df-93be-07cc953d7b99" Severity="warning" Detail="Cluster communication failure: The system is unable to communicate with one or more of the cluster peers" UTCTime="2020-08-12 19:57:43,771"
2020-08-12T14:57:53.872-05:00 expc01 tvcs: UTCTime="2020-08-12 19:57:53,871" Module="network.h323" Level="INFO": Action="Sent" Dst-ip="10.15.13.16" Dst-port="1719" Detail="Sending RAS SCI SeqNum=52319 Retransmit=True"
2020-08-12T14:57:54.872-05:00 expc01 tvcs: UTCTime="2020-08-12 19:57:54,871" Module="network.h323" Level="INFO": Action="Sent" Dst-ip="10.15.13.16" Dst-port="1719" Detail="Sending RAS LRQ SeqNum=52320 Retransmit=True"
2020-08-12T14:57:56.872-05:00 expc01 tvcs: UTCTime="2020-08-12 19:57:56,871" Module="network.h323" Level="INFO": Action="Sent" Dst-ip="10.15.13.16" Dst-port="1719" Detail="Sending RAS LRQ SeqNum=52320 Retransmit=True"
2020-08-12T14:57:57.871-05:00 expc01 tvcs: UTCTime="2020-08-12 19:57:57,871" Module="network.h323" Level="INFO": Action="Sent" Dst-ip="10.15.13.16" Dst-port="1719" Detail="Sending RAS SCI SeqNum=52319 Retransmit=True"
2020-08-12T14:57:58.871-05:00 expc01 tvcs: Event="External Server Communications Failure" Reason="gatekeeper timed out" Service="NeighbourGatekeeper" Detail="name:10.15.13.16:1719" Level="1" UTCTime="2020-08-12 19:57:58,871"
2020-08-12T14:57:58.871-05:00 expc01 tvcs: UTCTime="2020-08-12 19:57:58,871" Module="network.h323" Level="INFO": Action="Sent" Dst-ip="10.15.13.16" Dst-port="1719" Detail="Sending RAS LRQ SeqNum=52320 Timeout=True"
2020-08-12T14:57:59.601-05:00 expc01 UTCTime="2020-08-12 19:57:59,601" Module="developer.clusterdb.peernameresolver" Level="INFO" Node="clusterdb@expc01.apolo.local" PID="<0.145.0>" Detail="Triggering forced peer update of peers which failed DNS and queueing next run" Queue-Time-ms="300000"
2020-08-12T14:58:01.871-05:00 expc01 tvcs: UTCTime="2020-08-12 19:58:01,871" Module="network.h323" Level="INFO": Action="Sent" Dst-ip="10.15.13.16" Dst-port="1719" Detail="Sending RAS SCI SeqNum=52319 Timeout=True"
Change to TLS Enforcing on the Master node is shown in the next output:
2020-08-12T15:13:24.970-05:00 expc01 UTCTime="2020-08-12 20:13:24,969" Module="developer.cdbtable.cdb.clusterConfiguration" Level="DEBUG" Node="clusterdb@expc01.apolo.local" PID="<0.345.0>" Detail="Inserting into table" TableName="clusterConfiguration"
2020-08-12T15:13:24.976-05:00 expc01 UTCTime="2020-08-12 20:13:24,975" Event="System Configuration Changed" Node="clusterdb@expc01.apolo.local" PID="<0.345.0>" Detail="xconfiguration clusterConfiguration tls_verify - changed from: Permissive to: Enforcing"
2020-08-12T15:13:24.976-05:00 expc01 httpd[15060]: web: Event="System Configuration Changed" Detail="configuration/cluster/tls_verify - changed from: 'Permissive' to: 'Enforcing'" Src-ip="10.15.13.30" Src-port="53155" User="admin" Level="1" UTCTime="2020-08-12 20:13:24"
2020-08-12T15:13:24.979-05:00 expc01 management: UTCTime="2020-08-12 20:13:24,978" Module="developer.management.databasemanager" Level="INFO" CodeLocation="databasemanager(312)" Detail="Cluster configuration change detected"
2020-08-12T15:13:24.980-05:00 expc01 UTCTime="2020-08-12 20:13:24,980" Module="developer.cdbtable.cdb.clusterConfiguration" Level="DEBUG" Node="clusterdb@expc01.apolo.local" PID="<0.345.0>" Detail="Inserting into table" TableName="clusterConfiguration"
2020-08-12T15:13:24.986-05:00 expc01 management: UTCTime="2020-08-12 20:13:24,986" Module="developer.management.databasemanager" Level="INFO" CodeLocation="databasemanager(405)" Detail="TLS Verify change status" Startup="False" New="True"
2020-08-12T15:13:25.022-05:00 expc01 UTCTime="2020-08-12 20:13:25,022" Event="System Configuration Changed" Node="clusterdb@expc01.apolo.local" PID="<0.557.0>" Detail="xconfiguration alternatesConfiguration - Changed"
2020-08-12T15:13:25.022-05:00 expc01 UTCTime="2020-08-12 20:13:25,022" Module="developer.clusterdb.peernameresolver" Level="INFO" Node="clusterdb@expc01.apolo.local" PID="<0.145.0>" Detail="Notifying databasemanager (Management Framework)"
2020-08-12T15:13:25.022-05:00 expc01 UTCTime="2020-08-12 20:13:25,022" Module="developer.clusterdb.alternatesmanager" Level="INFO" Node="clusterdb@expc01.apolo.local" PID="<0.142.0>" Detail="alternate peer changed info recieved"
2020-08-12T15:13:25.031-05:00 expc01 UTCTime="2020-08-12 20:13:25,031" Event="System Configuration Changed" Node="clusterdb@expc01.apolo.local" PID="<0.557.0>" Detail="xconfiguration alternatesConfiguration - Changed"
2020-08-12T15:13:25.192-05:00 expc01 management: UTCTime="2020-08-12 20:13:25,192" Module="developer.diagnostics.alarmmanager" Level="INFO" CodeLocation="alarmmanager(173)" Detail="Raising alarm" UUID="e2b8e3d1-b731-4d7d-b606-4682a8f0c2e6" Parameters="null"
2020-08-12T15:13:25.195-05:00 expc01 management: Level="WARN" Event="Alarm Raised" Id="20007" UUID="e2b8e3d1-b731-4d7d-b606-4682a8f0c2e6" Severity="warning" Detail="Restart required: Cluster configuration has been changed, however a restart is required for this to take effect" UTCTime="2020-08-12 20:13:25,194"
From the Peer node perspective it is shown in the next output:
2020-08-12T15:13:24.976-05:00 expc02 UTCTime="2020-08-12 20:13:24,976" Event="System Configuration Changed" Node="clusterdb@expc02.apolo.local" PID="<0.390.0>" Detail="xconfiguration clusterConfiguration tls_verify - changed from: Permissive to: Enforcing"
2020-08-12T15:13:24.979-05:00 expc02 management: UTCTime="2020-08-12 20:13:24,978" Module="developer.management.databasemanager" Level="INFO" CodeLocation="databasemanager(312)" Detail="Cluster configuration change detected"
2020-08-12T15:13:24.982-05:00 expc02 management: UTCTime="2020-08-12 20:13:24,982" Module="developer.management.databasemanager" Level="INFO" CodeLocation="databasemanager(405)" Detail="TLS Verify change status" Startup="False" New="True"
2020-08-12T15:13:25.040-05:00 expc02 UTCTime="2020-08-12 20:13:25,040" Module="developer.clusterdb.peernameresolver" Level="INFO" Node="clusterdb@expc02.apolo.local" PID="<0.136.0>" Detail="Notifying databasemanager (Management Framework)"
2020-08-12T15:13:25.040-05:00 expc02 UTCTime="2020-08-12 20:13:25,040" Module="developer.clusterdb.alternatesmanager" Level="INFO" Node="clusterdb@expc02.apolo.local" PID="<0.143.0>" Detail="alternate peer changed info recieved"
2020-08-12T15:13:25.041-05:00 expc02 UTCTime="2020-08-12 20:13:25,041" Event="System Configuration Changed" Node="clusterdb@expc02.apolo.local" PID="<0.543.0>" Detail="xconfiguration alternatesConfiguration - Changed"
2020-08-12T15:13:25.042-05:00 expc02 UTCTime="2020-08-12 20:13:25,042" Event="System Configuration Changed" Node="clusterdb@expc02.apolo.local" PID="<0.543.0>" Detail="xconfiguration alternatesConfiguration - Changed"
2020-08-12T15:13:25.046-05:00 expc02 UTCTime="2020-08-12 20:13:25,046" Module="developer.clusterdb.alternatesmanager" Level="INFO" Node="clusterdb@expc02.apolo.local" PID="<0.143.0>" Detail="alternate peer changed info recieved"
2020-08-12T15:13:25.047-05:00 expc02 UTCTime="2020-08-12 20:13:25,046" Module="developer.clusterdb.peernameresolver" Level="INFO" Node="clusterdb@expc02.apolo.local" PID="<0.136.0>" Detail="Notifying databasemanager (Management Framework)"
2020-08-12T15:13:25.047-05:00 expc02 UTCTime="2020-08-12 20:13:25,047" Event="System Configuration Changed" Node="clusterdb@expc02.apolo.local" PID="<0.543.0>" Detail="xconfiguration alternatesConfiguration - Changed"
2020-08-12T15:13:25.049-05:00 expc02 UTCTime="2020-08-12 20:13:25,049" Event="System Configuration Changed" Node="clusterdb@expc02.apolo.local" PID="<0.543.0>" Detail="xconfiguration alternatesConfiguration - Changed"
2020-08-12T15:13:25.136-05:00 expc02 management: UTCTime="2020-08-12 20:13:25,136" Module="developer.diagnostics.alarmmanager" Level="INFO" CodeLocation="alarmmanager(173)" Detail="Raising alarm" UUID="e2b8e3d1-b731-4d7d-b606-4682a8f0c2e6" Parameters="null"
2020-08-12T15:13:25.139-05:00 expc02 management: Level="WARN" Event="Alarm Raised" Id="20007" UUID="e2b8e3d1-b731-4d7d-b606-4682a8f0c2e6" Severity="warning" Detail="Restart required: Cluster configuration has been changed, however a restart is required for this to take effect" UTCTime="2020-08-12 20:13:25,139"
The next vieos could be useful:
How to Create and Add a Peer to an Expressway Cluster
Removing a Peer from an Expressway Cluster
Fixing Expressway Replication Error "Peer's Configuration Conflicts With Primary"
Expressway Cluster Restart Procedure
How to Upgrade an Expressway ClusterGenerating CSR for MRA/ Clustered Expressways
Revision | Publish Date | Comments |
---|---|---|
1.0 |
02-Jul-2021 |
Initial Release |