The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes the configuration and deployment guidelines, as well as troubleshooting tips for those that add the Mobility Services Engine (MSE) High Availability (HA) and run Context Aware Services and/or Adaptive Wireless Intrusion Prevention System (AwIPS) to a Cisco Unified Wireless LAN (WLAN). The purpose of this document is to explain the guidelines for MSE HA and to provide HA deployment scenarios for MSE.
Note: This document does not provide configuration details for the MSE and associated components that do not pertain to MSE HA. This information is provided in other documents, and references are provided. Adaptive wIPS configuration is also not covered in this document.
The MSE is a platform that is capable of running multiple related services. These services provide high-level service functionality. Therefore, consideration for HA is critical in order to maintain the highest service confidence.
With HA enabled, every active MSE is backed up by another inactive instance. MSE HA introduces the health monitor in which it configures, manages, and monitors the high availability setup. A heartbeat is maintained between the primary and secondary MSE. The health monitor is responsible for setting up database, file replication, and monitoring the application. When the primary MSE fails and the secondary takes over, the virtual address of the primary MSE is switched transparently.
This setup (see Figure 1.) demonstrates a typical Cisco WLAN deployment that includes Cisco MSE enabled for HA.
HA support is available on MSE-3310, MSE-3350/3355, 3365 and Virtual Appliance on ESXi.
Figure 1. MSE Deployment in HA
The information here is about the MSE HA architecture:
MSE Virtual Appliance supports only 1:1 HA
One secondary MSE can support up to two primary MSEs. See the HA pairing matrix (figures 2 and 3)
HA supports Network Connected and Direct Connected
Only MSE Layer-2 redundancy is supported. Both the health monitor IP and virtual IP must be on the same subnet and accessible from the Network Control System (NCS) Layer-3 redundancy is not supported
Health monitor IP and virtual IP must be different
You can use either manual or automatic failover
You can use either manual or automatic failback
Both the primary and secondary MSE must be on the same software version
Every active primary MSE is backed up by another inactive instance. The secondary MSE becomes active only after the failover procedure is initiated.
The failover procedure can be manual or automatic
There is one software and database instance for each registered primary MSE.
Figure 2. MSE HA Support Pairing Matrix
The baseline of this matrix is that the secondary instance must always have equal or high specifications than the primary, whether they are appliances or virtual machines.
The MSE-3365 can only be paired with another MSE-3365. No other combination is tested/supported.
Figure 3. MSE HA N:1 Pairing Matrix
This example shows the HA configuration for the MSE Virtual Appliance (VA) (see figure 4). For this scenario, these settings are configured:
Primary MSE VA:
Virtual IP – [10.10.10.11] Health Monitor interface (Eth0) – [10.10.10.12]
Secondary MSE VA:
Virtual IP – [None] Health Monitor interface (Eth0) – [10.10.10.13]
Note: An activation license (L-MSE-7.0-K9) is required per VA. This is required for HA configuration of the VA.
Figure 4. MSE Virtual Appliance in HA
Refer to Cisco documentation on MSE Virtual Appliance for more information.
Here are the general steps:
Complete the VA installation for MSE and verify that all the network settings are met as shown in the image.
Setup parameters via Setup Wizard at first login as shown in the image.
Enter the required entries (host name, domain, etc.). Enter YES at the step to Configure High Availability.
Enter this information and as shown in the images.
Select Role – [1 for Primary].
Health Monitor interface – [eth0]*
*Network settings mapped to Network Adapter 1
Select direct connect interface [none] as shown in the image.
Enter this information and as shown in the image:
Virtual IP address – [10.10.10.11]
Network Mask – [255.255.255.0]
Start MSE in recovery mode – [No]
Enter this information and as shown in the image:
Configure Eth0 - [Yes]
Enter Eth0 IP address– [10.10.10.12]
Network Mask – [255.255.255.0]
Default Gateway – [10.10.10.1]
The second Ethernet interface (Eth1) is not used.
Configure eth1 interface - [skip] as shown in the image.
Continue through the Setup Wizard as shown in the images.
It is critical to enable the NTP server in order to synchronize the clock.
The preferred time zone is UTC.
This summarizes the MSE VA Primary setup:
-------BEGIN-------- Role=1, Health Monitor Interface=eth0, Direct connect interface=none Virtual IP Address=10.10.10.11, Virtual IP Netmask=255.255.255.0 Eth0 IP address=10.10.10.12, Eth0 network mask=255.0.0.0 Default Gateway=10.10.10.1 -------END--------
Enter yes to confirm that all setup information is correct as shown in the image.
A reboot is recommended after setup as shown in the image.
After a reboot, start the MSE services with the /etc/init.d/msed start or the service msed start commands as shown in the image.
After all services have started, confirm that MSE services are working properly with the getserverinfo command.
Operation status must show Up as shown in the image.
These steps are part of the setup for the secondary MSE VA:
After new install, the initial login starts the Setup Wizard. Enter this information as shown in the image:
Configure High Availability – [Yes]
Select role – [2] which indicates Secondary
Health Monitor Interface – [eth0] same as Primary
Enter the information as shown in the image:
Direct Connection – [None]
IP address eth0 – [10.10.10.13]
Network mask – [255.255.255.0]
Default Gateway – [10.10.10.1]
Configure eth1 interface – [Skip] as shown in the image.
Set the Time Zone - [UTC] as shown in the image.
Enable NTP server as shown in the image.
Complete the remaining steps of the Setup Wizard and confirm the setup information in order to save the configuration as shown in the image.
Reboot and start the services the same as the previous steps for the Primary MSE as shown in the image.
The next steps show how to add the Primary and Secondary MSE VA to the NCS. Perform the normal process of adding an MSE to the NCS. See the configuration guide for help.
From the NCS, navigate to Systems > Mobility Services and choose Mobility Services Engines as shown in the image.
From the drop-down menu, choose Add Mobility Services Engine. Then, click Go as shown in the image.
Follow the NCS configuration wizard for MSE. In this document's scenario, the values are:
Enter Device Name – e.g. [MSE1]
IP address – [10.10.10.12]
Username and Password (per initial setup)
Click Next as shown in the image.
Add all available licenses, then click Next as shown in the image.
Select MSE services, then click Next as shown in the image.
Enable Tracking parameters, then click Next as shown in the image.
It is optional to assign maps and synchronize MSE services. Click Done in order to complete the addition of the MSE to the NCS and as shown in the images.
The next screenshot shows that the Primary MSE VA has been added. Now, complete these steps in order to add the Secondary MSE VA:
Locate the Secondary Server column, and click the link to configure as shown in the image.
Add the Secondary MSE VA with the configuration in this scenario:
Secondary Device Name – [mse2]
Secondary IP Address – [10.10.10.13]
Secondary Password* – [default or from setup script]
Failover Type* – [Automatic, or Manual]
Fallback Type*
Long Failover Wait*
Click Save.
*Click the information icon or refer to MSE documentation, if required.
Click OK when the NCS prompts to pair up the two MSEs as shown in the image.
The NCS takes few seconds in order to create the configuration as shown in the image.
The NCS prompts if the Secondary MSE VA requires an activation license (L-MSE-7.0-K9) as shown in the image.
Click OK and locate the License File in order to activate Secondary as shown in the image.
Once the Secondary MSE VA has been activated, click Save to complete the configuration as shown in the image.
Navigate to NCS > Mobility Services > Mobility Services Engine.
The NCS displays this screen where the Secondary MSE appears in the column for Secondary Server:
In order to view the HA status, navigate to NCS > Services > High Availability as shown in the image.
In the HA status, you can see the current status and events by the MSE pair and as shown in the image.
It can take a few minutes for the initial synchronization and data replication to be set up. The NCS provides the progress % indication until the HA pair is fully active as seen earlier and as shown in the image.
A new command introduced with MSE software release 7.2 which relates to HA is gethainfo. This output shows the Primary and Secondary:
[root@mse1 ~]#gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Primary Health Monitor IP Address: 10.10.10.12 Virtual IP Address: 10.10.10.11 Version: 7.2.103.0 UDI: AIR-MSE-VA-K9:V01:mse1 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.10.10.13 Virtual IP Address: 10.10.10.11 Version: 7.2.103.0 UDI: AIR-MSE-VA-K9:V01:mse2_666f2046-5699-11e1-b1b1-0050568901d9 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3s Instance database port: 1624 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Up Current state: PRIMARY_ACTIVE [root@mse2 ~]#gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.10.10.13 Virtual IP Address: Not Applicable for a secondary Version: 7.2.103.0 UDI: AIR-MSE-VA-K9:V01:mse2 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.10.10.12 Virtual IP Address: 10.10.10.11 Version: 7.2.103.0 UDI: AIR-MSE-VA-K9:V01:mse1_d5972642-5696-11e1-bd0c-0050568901d6 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Up Current state: SECONDARY_ACTIVE
Network Connected MSE HA uses the network, whereas the Direct Connect configuration facilitates use of a direct cable connection between the Primary and Secondary MSE servers. This can help reduce latencies in heartbeat response times, data replication and failure detection times. For this scenario, a primary physical MSE connects to a secondary MSE on interface eth1, as seen in figure 5. Note that Eth1 is used for the direct connect. An IP address for each interface is required.
Figure 5: MSE HA with direct connect
Set up the Primary MSE.
Summary of configuration from setup script:
-------BEGIN-------- Host name=mse3355-1 Role=1 [Primary] Health Monitor Interface=eth0 Direct connect interface=eth1 Virtual IP Address=10.10.10.14 Virtual IP Netmask=255.255.255.0 Eth1 IP address=1.1.1.1 Eth1 network mask=255.0.0.0 Default Gateway =10.10.10.1 -------END--------
Set up the Secondary MSE.
Summary of configuration from setup script:
-------BEGIN-------- Host name=mse3355-2 Role=2 [Secondary] Health Monitor Interface=eth0 Direct connect interface=eth1 Eth0 IP Address 10.10.10.16 Eth0 network mask=255.255.255.0 Default Gateway=10.10.10.1 Eth1 IP address=1.1.1.2, Eth1 network mask=255.0.0.0 -------END--------
Add the Primary MSE to the NCS as shown in the image. (see previous examples, or refer to configuration guide).
In order to set up the Secondary MSE, navigate to NCS > configure Secondary Server.
Enter Secondary Device Name - [mse3355-2]
Secondary IP address – [10.10.10.16]
Complete remaining parameters and click Save as shown in the image.
Click OK in order to confirm the pair up of the two MSEs as shown in the image.
The NCS takes a moment to add the Secondary Server configuration as shown in the image.
When completed, make any changes to the HA parameters. Click Save as shown in the image.
View the HA status for real-time progress of the new MSE HA pair as shown in the image.
Navigate to NCS > Services > Mobility Services > Mobility Services Engines, confirm that the MSE (direct connect) HA is added to the NCS as shown in the image.
From the console, confirmation can also be seen with the gethainfo command.
Here is the Primary and Secondary output:
[root@mse3355-1 ~]#gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Primary Health Monitor IP Address: 10.10.10.15 Virtual IP Address: 10.10.10.14 Version: 7.2.103.0 UDI: AIR-MSE-3355-K9:V01:KQ37xx Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.10.10.16 Virtual IP Address: 10.10.10.14 Version: 7.2.103.0 UDI: AIR-MSE-3355-K9:V01:KQ45xx Failover type: Automatic Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3s Instance database port: 1624 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: Yes Heartbeat status: Up Current state: PRIMARY_ACTIVE [root@mse3355-2 ~]#gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.10.10.16 Virtual IP Address: Not Applicable for a secondary Version: 7.2.103.0 UDI: AIR-MSE-3355-K9:V01:KQ45xx Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.10.10.15 Virtual IP Address: 10.10.10.14 Version: 7.2.103.0 UDI: AIR-MSE-3355-K9:V01:KQ37xx Failover type: Automatic Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: Yes Heartbeat status: Up Current state: SECONDARY_ACTIVE
Based on the pairing matrix, the maximum in the HA configuration is 2:1. This is reserved for the MSE-3355, which in secondary mode, can support a MSE-3310 and MSE-3350. Direct connect is not applicable in this scenario.
Configure each of these MSEs to demonstrate 2:1 HA scenario:
MSE-3310 (Primary1) Server role: Primary Health Monitor IP Address (Eth0): 10.10.10.17 Virtual IP Address: 10.10.10.18 Eth1 – Not Applicable MSE-3350 (Primary2) Server role: Primary Health Monitor IP Address: 10.10.10.22 Virtual IP Address: 10.10.10.21 Eth1 – Not Applicable MSE-3355 (Secondary) Server role: Secondary Health Monitor IP Address: 10.10.10.16 Virtual IP Address: Not Applicable for a secondary
After all MSEs are configured, add Primary1 and Primary2 to the NCS as shown in the image.
Click to configure Secondary Server (as shown in previous examples). Start with either one of the Primary MSEs as shown in the image.
Enter the parameters for the Secondary MSE:
Secondary Device Name: for example, [mse-3355-2]
Secondary IP address – [10.10.10.16]
Complete the remaining parameters.
Click Save as shown in the image.
Wait a brief moment for the first secondary entry to be configured as shown in the image.
Confirm that the Secondary Server is added for the first Primary MSE as shown in the image.
Repeat steps 3 to 6 for the second Primary MSE as shown in the image.
Finalize with HA parameters for the second Primary MSE as shown in the image.
Save the settings as shown in the image.
Check the status for progress for each of the Primary MSEs as shown in the image.
Confirm that both Primary1 and Primary2 MSEs are set up with a Secondary MSE as shown in the image.
Navigate to NCS > Services > Mobility Services, choose High Availability as shown in the image.
Note that 2:1 is confirmed for the MSE-3355 as a secondary for MSE-3310 and MSE-3350 as shown in the image.
Here is a sample output of the HA setup from the console of all three MSEs when the gethainfo command is used:
[root@mse3355-2 ~]#gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.10.10.16 Virtual IP Address: Not Applicable for a secondary Version: 7.2.103.0 UDI: AIR-MSE-3355-K9:V01:KQ45xx Number of paired peers: 2 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.10.10.22 Virtual IP Address: 10.10.10.21 Version: 7.2.103.0 UDI: AIR-MSE-3350-K9:V01:MXQ839xx Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Up Current state: SECONDARY_ACTIVE ---------------------------- Peer configuration#: 2 ---------------------------- Health Monitor IP Address 10.10.10.17 Virtual IP Address: 10.10.10.18 Version: 7.2.103.0 UDI: AIR-MSE-3310-K9:V01:FTX140xx Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos4 Instance database port: 1525 Dataguard configuration name: dg_mse4 Primary database alias: mseop4s Direct connect used: No Heartbeat status: Up Current state: SECONDARY_ACTIVE
Final validation for HA in the NCS shows the status as fully Active for both the MSE-3310 and MSE-3350 as shown in the images.
There is currently no verification procedure available for this configuration.
This section provides information you can use in order to troubleshoot your configuration.
When you add the Secondary MSE, you can see a prompt as shown in the image.
It is possible, there was an issue during the setup script.
Run the getserverinfo command in order to check for proper network settings.
It is also possible that the services have not started. Run the /init.d/msed start command.
Run through the setup script again if required (/mse/setup/setup.sh) and save at the end.
The VA for MSE also requires an activation license (L-MSE-7.0-K9). Otherwise, the NCS prompts when you add the Secondary MSE VA. Obtain and add the activation license for the MSE VA as shown in the image.
If switching HA role on the MSE, ensure that the services are fully stopped. Therefore, stop services with the /init.d/msed stop command, then run the setup script again (/mse/setup/setup.sh) as shown in the image.
Run the gethainfo command in order to Get HA Information on the MSE. This provides useful information in troubleshooting or monitoring HA status and changes.
[root@mse3355-2 ~]#gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.10.10.16 Virtual IP Address: Not Applicable for a secondary Version: 7.2.103.0 UDI: AIR-MSE-3355-K9:V01:KQ45xx Number of paired peers: 2 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.10.10.22 Virtual IP Address: 10.10.10.21 Version: 7.2.103.0 UDI: AIR-MSE-3350-K9:V01:MXQ839xx Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Up Current state: SECONDARY_ACTIVE ---------------------------- Peer configuration#: 2 ---------------------------- Health Monitor IP Address 10.10.10.17 Virtual IP Address: 10.10.10.18 Version: 7.2.103.0 UDI: AIR-MSE-3310-K9:V01:FTX140xx Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos4 Instance database port: 1525 Dataguard configuration name: dg_mse4 Primary database alias: mseop4s Direct connect used: No Heartbeat status: Up Current state: SECONDARY_ACTIVE
In addition, the NCS HA View is a great management tool in order to get visibility to the HA setup for MSE as shown in the image.
The situation in case of manual failover/failback only, for better control.
Once the MSE HA is configured and up and running, the state on Prime as shown in the images:
Here are the getserverinfo and the gethainfo of the primary MSE:
[root@NicoMSE ~]# getserverinfo Health Monitor is running Retrieving MSE Services status. MSE services are up, getting the status ------------- Server Config ------------- Product name: Cisco Mobility Service Engine Version: 8.0.110.0 Health Monitor Ip Address: 10.48.39.238 High Availability Role: 1 Hw Version: V01 Hw Product Identifier: AIR-MSE-VA-K9 Hw Serial Number: NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 HTTPS: null Legacy Port: 8001 Log Modules: -1 Log Level: INFO Days to keep events: 2 Session timeout in mins: 30 DB backup in days: 2 ------------- Services ------------- Service Name: Context Aware Service Service Version: 8.0.1.79 Admin Status: Disabled Operation Status: Down Service Name: WIPS Service Version: 3.0.8155.0 Admin Status: Enabled Operation Status: Up Service Name: Mobile Concierge Service Service Version: 5.0.1.23 Admin Status: Disabled Operation Status: Down Service Name: CMX Analytics Service Version: 3.0.1.68 Admin Status: Disabled Operation Status: Down Service Name: CMX Connect & Engage Service Version: 1.0.0.29 Admin Status: Disabled Operation Status: Down Service Name: HTTP Proxy Service Service Version: 1.0.0.1 Admin Status: Disabled Operation Status: Down -------------- Server Monitor -------------- Server start time: Sun Mar 08 12:40:32 CET 2015 Server current time: Sun Mar 08 14:04:30 CET 2015 Server timezone: Europe/Brussels Server timezone offset (mins): 60 Restarts: 1 Used Memory (MB): 197 Allocated Memory (MB): 989 Max Memory (MB): 989 DB disk size (MB): 17191 --------------- Active Sessions --------------- Session ID: 5672 Session User ID: 1 Session IP Address: 10.48.39.238 Session start time: Sun Mar 08 12:44:54 CET 2015 Session last access time: Sun Mar 08 14:03:46 CET 2015 ---------------------------- Default Trap Destinations ---------------------------- Trap Destination - 1 ----------------- IP Address: 10.48.39.225 Last Updated: Sun Mar 08 12:34:12 CET 2015 [root@NicoMSE ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Primary Health Monitor IP Address: 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.240 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3s Instance database port: 1624 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Up Current state: PRIMARY_ACTIVE
And here are the same for the secondary MSE:
[root@NicoMSE2 ~]# getserverinfo Health Monitor is running Retrieving MSE Services status. MSE services are up and in DORMANT mode, getting the status ------------- Server Config ------------- Product name: Cisco Mobility Service Engine Version: 8.0.110.0 Health Monitor Ip Address: 10.48.39.240 High Availability Role: 2 Hw Version: V01 Hw Product Identifier: AIR-MSE-VA-K9 Hw Serial Number: NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 HTTPS: null Legacy Port: 8001 Log Modules: -1 Log Level: INFO Days to keep events: 2 Session timeout in mins: 30 DB backup in days: 2 ------------- Services ------------- Service Name: Context Aware Service Service Version: 8.0.1.79 Admin Status: Disabled Operation Status: Down Service Name: WIPS Service Version: 3.0.8155.0 Admin Status: Enabled Operation Status: Up Service Name: Mobile Concierge Service Service Version: 5.0.1.23 Admin Status: Disabled Operation Status: Down Service Name: CMX Analytics Service Version: 3.0.1.68 Admin Status: Disabled Operation Status: Down Service Name: CMX Connect & Engage Service Version: 1.0.0.29 Admin Status: Disabled Operation Status: Down Service Name: HTTP Proxy Service Service Version: 1.0.0.1 Admin Status: Disabled Operation Status: Down -------------- Server Monitor -------------- Server start time: Sun Mar 08 12:50:04 CET 2015 Server current time: Sun Mar 08 14:04:32 CET 2015 Server timezone: Europe/Brussels Server timezone offset (mins): 60 Restarts: null Used Memory (MB): 188 Allocated Memory (MB): 989 Max Memory (MB): 989 DB disk size (MB): 17191 [root@NicoMSE2 ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.48.39.240 Virtual IP Address: Not Applicable for a secondary Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Up Current state: SECONDARY_ACTIVE
In order to trigger manually, you go in the MSE HA configuration in Prime Infrastructure and click on Switchover.
Very quickly, the gethainfo on both servers will turn to FAILOVER_INVOKED
primary gethainfo:
[root@NicoMSE ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Primary Health Monitor IP Address: 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.240 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3s Instance database port: 1624 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Down Current state: FAILOVER_INVOKED
Secondary gethainfo:
[root@NicoMSE2 ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.48.39.240 Virtual IP Address: Not Applicable for a secondary Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Down Current state: FAILOVER_INVOKED
Once the failover is complete, you see this image on Prime:
The primary gethainfo :
[root@NicoMSE ~]# gethainfo Health Monitor is not running. Following information is from the last saved configuration ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Primary Health Monitor IP Address: 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.240 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3s Instance database port: 1624 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Last shutdown state: FAILOVER_ACTIVE
Secondary:
[root@NicoMSE2 ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.48.39.240 Virtual IP Address: Not Applicable for a secondary Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Down Current state: FAILOVER_ACTIVE
At this stage, the failover is finished and secondary MSE is fully in charge.
It is to be noted that services on the primary MSE stop when you do a manual switchover (in order to simulate a real event of primary MSE going down)
If you bring the primary back up, its state will be "TERMINATED". It's normal and secondary is still the one in charge and shows "FAILOVER_ACTIVE"
Before failing back, you must bring the primary back up.
It's state is then "TERMINATED":
[root@NicoMSE ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Primary Health Monitor IP Address: 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.240 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3s Instance database port: 1624 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Down Current state: TERMINATED
When you invoke the Failback from Prime, both nodes go in "FAILBACK ACTIVE" which is not the final state (contrary to "failover active").
primary gethainfo :
[root@NicoMSE ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Primary Health Monitor IP Address: 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.240 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3s Instance database port: 1624 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Down Current state: FAILBACK_ACTIVE
secondary gethainfo:
[root@NicoMSE2 ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.48.39.240 Virtual IP Address: Not Applicable for a secondary Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Down Current state: FAILBACK_ACTIVE
Prime shows this image:
When the failback is done but the secondary is still busy transferring data back to primary, the primary shows:
gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Primary Health Monitor IP Address: 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.240 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3s Instance database port: 1624 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Up Current state: FAILBACK_COMPLETE
secondary show:
[root@NicoMSE2 ~]# gethainfo Health Monitor is running. Retrieving HA related information ---------------------------------------------------- Base high availability configuration for this server ---------------------------------------------------- Server role: Secondary Health Monitor IP Address: 10.48.39.240 Virtual IP Address: Not Applicable for a secondary Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66 Number of paired peers: 1 ---------------------------- Peer configuration#: 1 ---------------------------- Health Monitor IP Address 10.48.39.238 Virtual IP Address: 10.48.39.224 Version: 8.0.110.0 UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63 Failover type: Manual Failback type: Manual Failover wait time (seconds): 10 Instance database name: mseos3 Instance database port: 1524 Dataguard configuration name: dg_mse3 Primary database alias: mseop3s Direct connect used: No Heartbeat status: Up Current state: SECONDARY_ALONE
Prime at this stage is as shown in the image:
When this completes, all status are back to original state: PRIMARY_ACTIVE, SECONDARY_ACTIVE and Prime HA status shows like a new deployment all over again.
PRIMARY_ACTIVE | State of the primary MSE when it is primary, in charge and all is fine |
SECONDARY_ACTIVE | State of the secondary MSE when it's up, but not in charge (primary still is), ready to take over when needed |
FAILOVER_INVOKED | Shown on both nodes when the failover happens, i.e. the secondary MSE start its services loading the database of primary MSE |
FAILOVER_ACTIVE | Final state of a failover. The secondary MSE is considered "up and running" and the primary MSE is down |
TERMINATED | State of an MSE node which is come back with services up after being down and when it's not the node in charge (so it can be the state of primary when services are restarted and PI still gives control on the secondary MSE). It also means the HA link might not be up (if one of the MSE is rebooting for example or simply not pingable) |
FAILBACK_ACTIVE | Contrary to the failover, this is not the final stage of the failback. This means that the failback was invoked and is currently taking place. Database is being copied from secondary back to primary |
FAILBACK_COMPLETE | Status of the primary node when it is back in charge but is still busy loading the database from the secondary MSE |
SECONDARY_ALONE | Status of the secondary MSE when the failback is done and the primary is in charge but still loading data |
GRACEFUL_SHUTDOWN | State triggered if you manually reboot or stop the services on the other MSE in case of automatic failover/failback. This means it will not take over since the downtime was manually provoked |
HA related logs are saved under the /opt/mse/logs/hm directory with health-monitor*.log being the primary log file.
Issue: Both the Primary & Secondary are active (Split brain condition)
1. Shutdown the Virtual IP interface (VIP) on the Secondary. It would be eth0:1 ifconfig eth0:1 down
2. Restart the services on the Secondary MSE
service msed stop
service msed start
3. Verify if the secondary has started to sync back with the Primary from Prime Infrastructure.
Issue: Synchronization of the secondary with the Primary for HA is stuck at X% for a long time
1. Stop the service on the secondary
service msed stop
2. Remove the /opt/mse/health-monitor/resources/config/advance-c
3. If there are still issues in establishing HA, it could have gotten into an inconsistent state where we have to remove everything under the 'data' directory on the secondary using rm -rf /opt/data/*
4. Restart the secondary. Add it from Prime Infrastructure to the Primary to initiate HA again.
Issue: Unable to delete the Secondary server from PI after it's unreachable
1. Stop the service on the Primary.
2. Remove the /opt/mse/health-monitor/resources/config/advance-c
3. Restart the service on the Primary.
4. Delete the Primary MSE from PI and re-add it.