Configuration and Deployment Guide for MSE Software Release 8.0 High Availability

Available Languages

Updated:August 16, 2018

Document ID:200058

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

This document describes the configuration and deployment guidelines, as well as troubleshooting tips for those that add the Mobility Services Engine (MSE) High Availability (HA) and run Context Aware Services and/or Adaptive Wireless Intrusion Prevention System (AwIPS) to a Cisco Unified Wireless LAN (WLAN). The purpose of this document is to explain the guidelines for MSE HA and to provide HA deployment scenarios for MSE.

Note: This document does not provide configuration details for the MSE and associated components that do not pertain to MSE HA. This information is provided in other documents, and references are provided. Adaptive wIPS configuration is also not covered in this document.

Background Information

The MSE is a platform that is capable of running multiple related services. These services provide high-level service functionality. Therefore, consideration for HA is critical in order to maintain the highest service confidence.

With HA enabled, every active MSE is backed up by another inactive instance. MSE HA introduces the health monitor in which it configures, manages, and monitors the high availability setup. A heartbeat is maintained between the primary and secondary MSE. The health monitor is responsible for setting up database, file replication, and monitoring the application. When the primary MSE fails and the secondary takes over, the virtual address of the primary MSE is switched transparently.

This setup (see Figure 1.) demonstrates a typical Cisco WLAN deployment that includes Cisco MSE enabled for HA.

HA support is available on MSE-3310, MSE-3350/3355, 3365 and Virtual Appliance on ESXi.

Figure 1. MSE Deployment in HA

Guidelines and Limitations

The information here is about the MSE HA architecture:

MSE Virtual Appliance supports only 1:1 HA
One secondary MSE can support up to two primary MSEs. See the HA pairing matrix (figures 2 and 3)
HA supports Network Connected and Direct Connected
Only MSE Layer-2 redundancy is supported. Both the health monitor IP and virtual IP must be on the same subnet and accessible from the Network Control System (NCS) Layer-3 redundancy is not supported
Health monitor IP and virtual IP must be different
You can use either manual or automatic failover
You can use either manual or automatic failback
Both the primary and secondary MSE must be on the same software version
Every active primary MSE is backed up by another inactive instance. The secondary MSE becomes active only after the failover procedure is initiated.
The failover procedure can be manual or automatic
There is one software and database instance for each registered primary MSE.

Figure 2. MSE HA Support Pairing Matrix

The baseline of this matrix is that the secondary instance must always have equal or high specifications than the primary, whether they are appliances or virtual machines.

The MSE-3365 can only be paired with another MSE-3365. No other combination is tested/supported.

Figure 3. MSE HA N:1 Pairing Matrix

HA Configuration Scenario for MSE Virtual Appliance (Network Connected)

This example shows the HA configuration for the MSE Virtual Appliance (VA) (see figure 4). For this scenario, these settings are configured:

Primary MSE VA:

Virtual IP – [10.10.10.11]

Health Monitor interface (Eth0) – [10.10.10.12]

Secondary MSE VA:

Virtual IP – [None]

Health Monitor interface (Eth0) – [10.10.10.13]

Note: An activation license (L-MSE-7.0-K9) is required per VA. This is required for HA configuration of the VA.

Figure 4. MSE Virtual Appliance in HA

Refer to Cisco documentation on MSE Virtual Appliance for more information.

Here are the general steps:

Complete the VA installation for MSE and verify that all the network settings are met as shown in the image.
Setup parameters via Setup Wizard at first login as shown in the image.
Enter the required entries (host name, domain, etc.). Enter YES at the step to Configure High Availability.
Enter this information and as shown in the images.
- Select Role – [1 for Primary].
- Health Monitor interface – [eth0]^*
  
  ^*Network settings mapped to Network Adapter 1
Select direct connect interface [none] as shown in the image.
Enter this information and as shown in the image:
- Virtual IP address – [10.10.10.11]
- Network Mask – [255.255.255.0]
- Start MSE in recovery mode – [No]
Enter this information and as shown in the image:
- Configure Eth0 - [Yes]
- Enter Eth0 IP address– [10.10.10.12]
- Network Mask – [255.255.255.0]
- Default Gateway – [10.10.10.1]
The second Ethernet interface (Eth1) is not used.

Configure eth1 interface - [skip] as shown in the image.

Continue through the Setup Wizard as shown in the images.

It is critical to enable the NTP server in order to synchronize the clock.

The preferred time zone is UTC.

This summarizes the MSE VA Primary setup:

-------BEGIN--------
Role=1, Health Monitor Interface=eth0, Direct connect interface=none
Virtual IP Address=10.10.10.11, Virtual IP Netmask=255.255.255.0
Eth0 IP address=10.10.10.12, Eth0 network mask=255.0.0.0
Default Gateway=10.10.10.1
-------END--------

Enter yes to confirm that all setup information is correct as shown in the image.
A reboot is recommended after setup as shown in the image.
After a reboot, start the MSE services with the /etc/init.d/msed start or the service msed start commands as shown in the image.
After all services have started, confirm that MSE services are working properly with the getserverinfo command.

Operation status must show Up as shown in the image.

Setting up the secondary MSE

These steps are part of the setup for the secondary MSE VA:

After new install, the initial login starts the Setup Wizard. Enter this information as shown in the image:
- Configure High Availability – [Yes]
- Select role – [2] which indicates Secondary
- Health Monitor Interface – [eth0] same as Primary
Enter the information as shown in the image:
- Direct Connection – [None]
- IP address eth0 – [10.10.10.13]
- Network mask – [255.255.255.0]
- Default Gateway – [10.10.10.1]
Configure eth1 interface – [Skip] as shown in the image.
Set the Time Zone - [UTC] as shown in the image.
Enable NTP server as shown in the image.
Complete the remaining steps of the Setup Wizard and confirm the setup information in order to save the configuration as shown in the image.
Reboot and start the services the same as the previous steps for the Primary MSE as shown in the image.

Managing them from Cisco Prime NCS (or Prime Infrastrcture)

The next steps show how to add the Primary and Secondary MSE VA to the NCS. Perform the normal process of adding an MSE to the NCS. See the configuration guide for help.

From the NCS, navigate to Systems > Mobility Services and choose Mobility Services Engines as shown in the image.
From the drop-down menu, choose Add Mobility Services Engine. Then, click Go as shown in the image.
Follow the NCS configuration wizard for MSE. In this document's scenario, the values are:
- Enter Device Name – e.g. [MSE1]
- IP address – [10.10.10.12]
- Username and Password (per initial setup)
- Click Next as shown in the image.
Add all available licenses, then click Next as shown in the image.
Select MSE services, then click Next as shown in the image.
Enable Tracking parameters, then click Next as shown in the image.
It is optional to assign maps and synchronize MSE services. Click Done in order to complete the addition of the MSE to the NCS and as shown in the images.

Adding the secondary MSE to Cisco Prime NCS

The next screenshot shows that the Primary MSE VA has been added. Now, complete these steps in order to add the Secondary MSE VA:

Locate the Secondary Server column, and click the link to configure as shown in the image.
Add the Secondary MSE VA with the configuration in this scenario:
- Secondary Device Name – [mse2]
- Secondary IP Address – [10.10.10.13]
- Secondary Password* – [default or from setup script]
- Failover Type* – [Automatic, or Manual]
- Fallback Type*
- Long Failover Wait*
- Click Save.
  
  *Click the information icon or refer to MSE documentation, if required.
Click OK when the NCS prompts to pair up the two MSEs as shown in the image.

The NCS takes few seconds in order to create the configuration as shown in the image.

The NCS prompts if the Secondary MSE VA requires an activation license (L-MSE-7.0-K9) as shown in the image.
Click OK and locate the License File in order to activate Secondary as shown in the image.
Once the Secondary MSE VA has been activated, click Save to complete the configuration as shown in the image.
Navigate to NCS > Mobility Services > Mobility Services Engine.

The NCS displays this screen where the Secondary MSE appears in the column for Secondary Server:
In order to view the HA status, navigate to NCS > Services > High Availability as shown in the image.

In the HA status, you can see the current status and events by the MSE pair and as shown in the image.

It can take a few minutes for the initial synchronization and data replication to be set up. The NCS provides the progress % indication until the HA pair is fully active as seen earlier and as shown in the image.

A new command introduced with MSE software release 7.2 which relates to HA is gethainfo. This output shows the Primary and Secondary:

[root@mse1 ~]#gethainfo

Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Primary
Health Monitor IP Address: 10.10.10.12
Virtual IP Address: 10.10.10.11
Version: 7.2.103.0
UDI: AIR-MSE-VA-K9:V01:mse1
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.10.10.13
Virtual IP Address: 10.10.10.11
Version: 7.2.103.0
UDI: AIR-MSE-VA-K9:V01:mse2_666f2046-5699-11e1-b1b1-0050568901d9
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3s
Instance database port: 1624
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Up
Current state: PRIMARY_ACTIVE


[root@mse2 ~]#gethainfo

Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.10.10.13
Virtual IP Address: Not Applicable for a secondary
Version: 7.2.103.0
UDI: AIR-MSE-VA-K9:V01:mse2
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.10.10.12
Virtual IP Address: 10.10.10.11
Version: 7.2.103.0
UDI: AIR-MSE-VA-K9:V01:mse1_d5972642-5696-11e1-bd0c-0050568901d6
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Up
Current state: SECONDARY_ACTIVE

HA Configuration with Direct Connected

Network Connected MSE HA uses the network, whereas the Direct Connect configuration facilitates use of a direct cable connection between the Primary and Secondary MSE servers. This can help reduce latencies in heartbeat response times, data replication and failure detection times. For this scenario, a primary physical MSE connects to a secondary MSE on interface eth1, as seen in figure 5. Note that Eth1 is used for the direct connect. An IP address for each interface is required.

Figure 5: MSE HA with direct connect

Set up the Primary MSE.

Summary of configuration from setup script:

-------BEGIN--------
Host name=mse3355-1
Role=1 [Primary]
Health Monitor Interface=eth0
Direct connect interface=eth1
Virtual IP Address=10.10.10.14
Virtual IP Netmask=255.255.255.0
Eth1 IP address=1.1.1.1
Eth1 network mask=255.0.0.0
Default Gateway =10.10.10.1
-------END--------

Set up the Secondary MSE.

Summary of configuration from setup script:

-------BEGIN--------
Host name=mse3355-2
Role=2 [Secondary]
Health Monitor Interface=eth0
Direct connect interface=eth1
Eth0 IP Address 10.10.10.16
Eth0 network mask=255.255.255.0
Default Gateway=10.10.10.1
Eth1 IP address=1.1.1.2, 
Eth1 network mask=255.0.0.0
-------END--------

Add the Primary MSE to the NCS as shown in the image. (see previous examples, or refer to configuration guide).
In order to set up the Secondary MSE, navigate to NCS > configure Secondary Server.
1. Enter Secondary Device Name - [mse3355-2]
2. Secondary IP address – [10.10.10.16]
3. Complete remaining parameters and click Save as shown in the image.
Click OK in order to confirm the pair up of the two MSEs as shown in the image.

The NCS takes a moment to add the Secondary Server configuration as shown in the image.
When completed, make any changes to the HA parameters. Click Save as shown in the image.
View the HA status for real-time progress of the new MSE HA pair as shown in the image.
Navigate to NCS > Services > Mobility Services > Mobility Services Engines, confirm that the MSE (direct connect) HA is added to the NCS as shown in the image.

From the console, confirmation can also be seen with the gethainfo command.

Here is the Primary and Secondary output:

[root@mse3355-1 ~]#gethainfo

Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Primary
Health Monitor IP Address: 10.10.10.15
Virtual IP Address: 10.10.10.14
Version: 7.2.103.0
UDI: AIR-MSE-3355-K9:V01:KQ37xx
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.10.10.16
Virtual IP Address: 10.10.10.14
Version: 7.2.103.0
UDI: AIR-MSE-3355-K9:V01:KQ45xx
Failover type: Automatic
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3s
Instance database port: 1624
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: Yes
Heartbeat status: Up
Current state: PRIMARY_ACTIVE

[root@mse3355-2 ~]#gethainfo

Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.10.10.16
Virtual IP Address: Not Applicable for a secondary
Version: 7.2.103.0
UDI: AIR-MSE-3355-K9:V01:KQ45xx
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.10.10.15
Virtual IP Address: 10.10.10.14
Version: 7.2.103.0
UDI: AIR-MSE-3355-K9:V01:KQ37xx
Failover type: Automatic
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: Yes
Heartbeat status: Up
Current state: SECONDARY_ACTIVE

HA Configuration Scenario for MSE Physical Appliance

Based on the pairing matrix, the maximum in the HA configuration is 2:1. This is reserved for the MSE-3355, which in secondary mode, can support a MSE-3310 and MSE-3350. Direct connect is not applicable in this scenario.

Configure each of these MSEs to demonstrate 2:1 HA scenario:

MSE-3310 (Primary1)
Server role: Primary
Health Monitor IP Address (Eth0): 10.10.10.17
Virtual IP Address: 10.10.10.18
Eth1 – Not Applicable

MSE-3350 (Primary2)
Server role: Primary
Health Monitor IP Address: 10.10.10.22
Virtual IP Address: 10.10.10.21
Eth1 – Not Applicable

MSE-3355 (Secondary)
Server role: Secondary
Health Monitor IP Address: 10.10.10.16
Virtual IP Address: Not Applicable for a secondary

After all MSEs are configured, add Primary1 and Primary2 to the NCS as shown in the image.
Click to configure Secondary Server (as shown in previous examples). Start with either one of the Primary MSEs as shown in the image.
Enter the parameters for the Secondary MSE:
1. Secondary Device Name: for example, [mse-3355-2]
2. Secondary IP address – [10.10.10.16]
3. Complete the remaining parameters.
4. Click Save as shown in the image.
Wait a brief moment for the first secondary entry to be configured as shown in the image.
Confirm that the Secondary Server is added for the first Primary MSE as shown in the image.
Repeat steps 3 to 6 for the second Primary MSE as shown in the image.
Finalize with HA parameters for the second Primary MSE as shown in the image.
Save the settings as shown in the image.
Check the status for progress for each of the Primary MSEs as shown in the image.
Confirm that both Primary1 and Primary2 MSEs are set up with a Secondary MSE as shown in the image.

Navigate to NCS > Services > Mobility Services, choose High Availability as shown in the image.

Note that 2:1 is confirmed for the MSE-3355 as a secondary for MSE-3310 and MSE-3350 as shown in the image.

Here is a sample output of the HA setup from the console of all three MSEs when the gethainfo command is used:

[root@mse3355-2 ~]#gethainfo

Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.10.10.16
Virtual IP Address: Not Applicable for a secondary
Version: 7.2.103.0
UDI: AIR-MSE-3355-K9:V01:KQ45xx
Number of paired peers: 2

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.10.10.22
Virtual IP Address: 10.10.10.21
Version: 7.2.103.0
UDI: AIR-MSE-3350-K9:V01:MXQ839xx
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Up
Current state: SECONDARY_ACTIVE


----------------------------
Peer configuration#: 2
----------------------------

Health Monitor IP Address 10.10.10.17
Virtual IP Address: 10.10.10.18
Version: 7.2.103.0
UDI: AIR-MSE-3310-K9:V01:FTX140xx
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos4
Instance database port: 1525
Dataguard configuration name: dg_mse4
Primary database alias: mseop4s
Direct connect used: No
Heartbeat status: Up
Current state: SECONDARY_ACTIVE

Final validation for HA in the NCS shows the status as fully Active for both the MSE-3310 and MSE-3350 as shown in the images.

Verify

There is currently no verification procedure available for this configuration.

Basic Troubleshooting of MSE HA

This section provides information you can use in order to troubleshoot your configuration.

When you add the Secondary MSE, you can see a prompt as shown in the image.

It is possible, there was an issue during the setup script.

Run the getserverinfo command in order to check for proper network settings.
It is also possible that the services have not started. Run the /init.d/msed start command.
Run through the setup script again if required (/mse/setup/setup.sh) and save at the end.

The VA for MSE also requires an activation license (L-MSE-7.0-K9). Otherwise, the NCS prompts when you add the Secondary MSE VA. Obtain and add the activation license for the MSE VA as shown in the image.

If switching HA role on the MSE, ensure that the services are fully stopped. Therefore, stop services with the /init.d/msed stop command, then run the setup script again (/mse/setup/setup.sh) as shown in the image.

Run the gethainfo command in order to Get HA Information on the MSE. This provides useful information in troubleshooting or monitoring HA status and changes.

[root@mse3355-2 ~]#gethainfo

Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.10.10.16
Virtual IP Address: Not Applicable for a secondary
Version: 7.2.103.0
UDI: AIR-MSE-3355-K9:V01:KQ45xx
Number of paired peers: 2

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.10.10.22
Virtual IP Address: 10.10.10.21
Version: 7.2.103.0
UDI: AIR-MSE-3350-K9:V01:MXQ839xx
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Up
Current state: SECONDARY_ACTIVE


----------------------------
Peer configuration#: 2
----------------------------

Health Monitor IP Address 10.10.10.17
Virtual IP Address: 10.10.10.18
Version: 7.2.103.0
UDI: AIR-MSE-3310-K9:V01:FTX140xx
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos4
Instance database port: 1525
Dataguard configuration name: dg_mse4
Primary database alias: mseop4s
Direct connect used: No
Heartbeat status: Up
Current state: SECONDARY_ACTIVE

In addition, the NCS HA View is a great management tool in order to get visibility to the HA setup for MSE as shown in the image.

Scenario of Failover/Failback

The situation in case of manual failover/failback only, for better control.

Primary is Up, Secondary is Ready to Take Over

Once the MSE HA is configured and up and running, the state on Prime as shown in the images:

Here are the getserverinfo and the gethainfo of the primary MSE:

[root@NicoMSE ~]# getserverinfo
Health Monitor is running
Retrieving MSE Services status.
MSE services are up, getting the status


-------------
Server Config
-------------

Product name: Cisco Mobility Service Engine
Version: 8.0.110.0
Health Monitor Ip Address: 10.48.39.238
High Availability Role: 1
Hw Version: V01
Hw Product Identifier: AIR-MSE-VA-K9
Hw Serial Number: NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
HTTPS: null
Legacy Port: 8001
Log Modules: -1
Log Level: INFO
Days to keep events: 2
Session timeout in mins: 30
DB backup in days: 2

-------------
Services
-------------

Service Name: Context Aware Service
Service Version: 8.0.1.79
Admin Status: Disabled
Operation Status: Down

Service Name: WIPS
Service Version: 3.0.8155.0
Admin Status: Enabled
Operation Status: Up

Service Name: Mobile Concierge Service
Service Version: 5.0.1.23
Admin Status: Disabled
Operation Status: Down

Service Name: CMX Analytics
Service Version: 3.0.1.68
Admin Status: Disabled
Operation Status: Down

Service Name: CMX Connect & Engage
Service Version: 1.0.0.29
Admin Status: Disabled
Operation Status: Down

Service Name: HTTP Proxy Service
Service Version: 1.0.0.1
Admin Status: Disabled
Operation Status: Down

--------------
Server Monitor
--------------


Server start time: Sun Mar 08 12:40:32 CET 2015
Server current time: Sun Mar 08 14:04:30 CET 2015
Server timezone: Europe/Brussels
Server timezone offset (mins): 60
Restarts: 1
Used Memory (MB): 197
Allocated Memory (MB): 989
Max Memory (MB): 989
DB disk size (MB): 17191

---------------
Active Sessions
---------------

Session ID: 5672
Session User ID: 1
Session IP Address: 10.48.39.238
Session start time: Sun Mar 08 12:44:54 CET 2015
Session last access time: Sun Mar 08 14:03:46 CET 2015

----------------------------
Default Trap Destinations
----------------------------

Trap Destination - 1
-----------------
IP Address: 10.48.39.225
Last Updated: Sun Mar 08 12:34:12 CET 2015


[root@NicoMSE ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Primary
Health Monitor IP Address: 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.240
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3s
Instance database port: 1624
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Up
Current state: PRIMARY_ACTIVE

And here are the same for the secondary MSE:

[root@NicoMSE2 ~]# getserverinfo
Health Monitor is running
Retrieving MSE Services status.
MSE services are up and in DORMANT mode, getting the status


-------------
Server Config
-------------

Product name: Cisco Mobility Service Engine
Version: 8.0.110.0
Health Monitor Ip Address: 10.48.39.240
High Availability Role: 2
Hw Version: V01
Hw Product Identifier: AIR-MSE-VA-K9
Hw Serial Number: NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
HTTPS: null
Legacy Port: 8001
Log Modules: -1
Log Level: INFO
Days to keep events: 2
Session timeout in mins: 30
DB backup in days: 2

-------------
Services
-------------

Service Name: Context Aware Service
Service Version: 8.0.1.79
Admin Status: Disabled
Operation Status: Down

Service Name: WIPS
Service Version: 3.0.8155.0
Admin Status: Enabled
Operation Status: Up

Service Name: Mobile Concierge Service
Service Version: 5.0.1.23
Admin Status: Disabled
Operation Status: Down

Service Name: CMX Analytics
Service Version: 3.0.1.68
Admin Status: Disabled
Operation Status: Down

Service Name: CMX Connect & Engage
Service Version: 1.0.0.29
Admin Status: Disabled
Operation Status: Down

Service Name: HTTP Proxy Service
Service Version: 1.0.0.1
Admin Status: Disabled
Operation Status: Down

--------------
Server Monitor
--------------


Server start time: Sun Mar 08 12:50:04 CET 2015
Server current time: Sun Mar 08 14:04:32 CET 2015
Server timezone: Europe/Brussels
Server timezone offset (mins): 60
Restarts: null
Used Memory (MB): 188
Allocated Memory (MB): 989
Max Memory (MB): 989
DB disk size (MB): 17191
[root@NicoMSE2 ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.48.39.240
Virtual IP Address: Not Applicable for a secondary
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Up
Current state: SECONDARY_ACTIVE

Failing Over to Secondary

In order to trigger manually, you go in the MSE HA configuration in Prime Infrastructure and click on Switchover.

Very quickly, the gethainfo on both servers will turn to FAILOVER_INVOKED

primary gethainfo:

[root@NicoMSE ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Primary
Health Monitor IP Address: 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.240
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3s
Instance database port: 1624
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Down
Current state: FAILOVER_INVOKED

Secondary gethainfo:

[root@NicoMSE2 ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.48.39.240
Virtual IP Address: Not Applicable for a secondary
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Down
Current state: FAILOVER_INVOKED

Once the failover is complete, you see this image on Prime:

The primary gethainfo :

[root@NicoMSE ~]# gethainfo
   
Health Monitor is not running. Following information is from the last saved configuration

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Primary
Health Monitor IP Address: 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.240
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3s
Instance database port: 1624
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Last shutdown state: FAILOVER_ACTIVE

Secondary:

[root@NicoMSE2 ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.48.39.240
Virtual IP Address: Not Applicable for a secondary
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Down
Current state: FAILOVER_ACTIVE

At this stage, the failover is finished and secondary MSE is fully in charge.

It is to be noted that services on the primary MSE stop when you do a manual switchover (in order to simulate a real event of primary MSE going down)

If you bring the primary back up, its state will be "TERMINATED". It's normal and secondary is still the one in charge and shows "FAILOVER_ACTIVE"

Failing Back to Primary

Before failing back, you must bring the primary back up.

It's state is then "TERMINATED":

[root@NicoMSE ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Primary
Health Monitor IP Address: 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.240
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3s
Instance database port: 1624
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Down
Current state: TERMINATED

When you invoke the Failback from Prime, both nodes go in "FAILBACK ACTIVE" which is not the final state (contrary to "failover active").

primary gethainfo :

[root@NicoMSE ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Primary
Health Monitor IP Address: 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.240
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3s
Instance database port: 1624
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Down
Current state: FAILBACK_ACTIVE

secondary gethainfo:

[root@NicoMSE2 ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.48.39.240
Virtual IP Address: Not Applicable for a secondary
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Down
Current state: FAILBACK_ACTIVE

Prime shows this image:

When the failback is done but the secondary is still busy transferring data back to primary, the primary shows:

 gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Primary
Health Monitor IP Address: 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.240
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3s
Instance database port: 1624
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Up
Current state: FAILBACK_COMPLETE

secondary show:

[root@NicoMSE2 ~]# gethainfo
   
Health Monitor is running. Retrieving HA related information

----------------------------------------------------
Base high availability configuration for this server
----------------------------------------------------

Server role: Secondary
Health Monitor IP Address: 10.48.39.240
Virtual IP Address: Not Applicable for a secondary
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE2_1c6b1940-b6a5-11e4-b017-005056993b66
Number of paired peers: 1

----------------------------
Peer configuration#: 1
----------------------------

Health Monitor IP Address 10.48.39.238
Virtual IP Address: 10.48.39.224
Version: 8.0.110.0
UDI: AIR-MSE-VA-K9:V01:NicoMSE_b950a7c0-b68c-11e4-99d9-005056993b63
Failover type: Manual
Failback type: Manual
Failover wait time (seconds): 10
Instance database name: mseos3
Instance database port: 1524
Dataguard configuration name: dg_mse3
Primary database alias: mseop3s
Direct connect used: No
Heartbeat status: Up
Current state: SECONDARY_ALONE

Prime at this stage is as shown in the image:

When this completes, all status are back to original state: PRIMARY_ACTIVE, SECONDARY_ACTIVE and Prime HA status shows like a new deployment all over again.

HA State Matrix

PRIMARY_ACTIVE	State of the primary MSE when it is primary, in charge and all is fine
SECONDARY_ACTIVE	State of the secondary MSE when it's up, but not in charge (primary still is), ready to take over when needed
FAILOVER_INVOKED	Shown on both nodes when the failover happens, i.e. the secondary MSE start its services loading the database of primary MSE
FAILOVER_ACTIVE	Final state of a failover. The secondary MSE is considered "up and running" and the primary MSE is down
TERMINATED	State of an MSE node which is come back with services up after being down and when it's not the node in charge (so it can be the state of primary when services are restarted and PI still gives control on the secondary MSE). It also means the HA link might not be up (if one of the MSE is rebooting for example or simply not pingable)
FAILBACK_ACTIVE	Contrary to the failover, this is not the final stage of the failback. This means that the failback was invoked and is currently taking place. Database is being copied from secondary back to primary
FAILBACK_COMPLETE	Status of the primary node when it is back in charge but is still busy loading the database from the secondary MSE
SECONDARY_ALONE	Status of the secondary MSE when the failback is done and the primary is in charge but still loading data
GRACEFUL_SHUTDOWN	State triggered if you manually reboot or stop the services on the other MSE in case of automatic failover/failback. This means it will not take over since the downtime was manually provoked

Important Remarks and Facts about HA

It is very important to not trigger a failback immediately after a failover was done and vice-versa. Databases need a good 30 minutes to stabilize
The HA config files is base-ha-config.properties in /opt/mse/health-monitor/resources/config/, however it's not meant to be manually edited (use setup.sh instead). However you can view it in case of doubt
HA is not meant to be broken manually. The only clean way of doing is to delete the secondary MSE from Prime Infra. Any other method (running setup.sh on secondary to make it a primary, uninstalling, changing ip ...) will break the database and state machine and you will likely have to reinstall both MSEs

Troubleshoot HA

HA related logs are saved under the /opt/mse/logs/hm directory with health-monitor*.log being the primary log file.

Issue: Both the Primary & Secondary are active (Split brain condition)

1. Shutdown the Virtual IP interface (VIP) on the Secondary. It would be eth0:1 ifconfig eth0:1 down

2. Restart the services on the Secondary MSE

service msed stop
service msed start

3. Verify if the secondary has started to sync back with the Primary from Prime Infrastructure.

Issue: Synchronization of the secondary with the Primary for HA is stuck at X% for a long time

1. Stop the service on the secondary

service msed stop

2. Remove the /opt/mse/health-monitor/resources/config/advance-config-<IP-address-of-Primary>.properties file on the Secondary.

3. If there are still issues in establishing HA, it could have gotten into an inconsistent state where we have to remove everything under the 'data' directory on the secondary using rm -rf /opt/data/*

4. Restart the secondary. Add it from Prime Infrastructure to the Primary to initiate HA again.

Issue: Unable to delete the Secondary server from PI after it's unreachable

1. Stop the service on the Primary.

2. Remove the /opt/mse/health-monitor/resources/config/advance-config-<IP-address-of-Primary>.properties file on the Primary.

3. Restart the service on the Primary.

4. Delete the Primary MSE from PI and re-add it.

Contributed by Cisco Engineers

Nicholas Darchis
Cisco TAC Engineer
Ram Krishnamoorthy
Cisco TAC Engineer
Jimit Dalal
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Mobility Services Engine