Failover Configuration Guide for Cisco Digital Media Suite 5.2.x

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Book Contents

Find Matches in This Book

Results

Updated:: March 13, 2015

Chapter: Monitoring and Controlling Failover

Failover Alerts
Monitoring Failover from Cisco DMM
Monitoring Failover from AAI
- Replication Status
- Cluster Resource Status
Forcing a Unit to Fail Over

Monitoring and Controlling Failover

Revised: May 31, 2011,

This section contains the following topics:

•Failover Alerts

•Monitoring Failover from Cisco DMM

•Monitoring Failover from AAI

•Forcing a Unit to Fail Over

Failover Alerts

Two additional alerts have been added to the Cisco DMM Administration > Alerts >Notification Rules page to support failover:

•Cluster node is deactivated—When configured, this alert is triggered whenever an appliance in a failover configuration goes offline.

•Cluster Node is activated—When configured, this alert is triggered whenever an appliance in a failover configuration comes online.

When an appliance in a failover configuration fails, you will receive a cluster node down notification.

If you reboot an appliance, you will receive a cluster down notification followed by a cluster node activated notification for that appliance as the appliances reboots into the standby state.

Figure 4-1 Failover Alerts

See Chapter 8: Events and Notifications in the User Guide for Cisco Digital Media Manager 5.2.x for information about enabling events, configuring your SNMP server, and populating your MIB browser.

http://www.cisco.com/en/US/docs/video/digital_media_systems/5_x/5_2/dmm/user/guide/admin/eventnotify.html

For more information about each type of alert, see the following topics:

•SNMP Alerts

•Syslog Alerts

•E-Mail Alerts

SNMP Alerts

http://www.cisco.com/en/US/docs/video/digital_media_systems/5_x/5_2/dmm/user/guide/admin/eventnotify.html

The following traps pertain to appliance Up/Down events:

•.1.3.6.1.4.1.9.9.655.0.6—cluster node down

•.1.3.6.1.4.1.9.9.655.0.5—cluster node up

Syslog Alerts

The following are sample UP/DOWN syslog alerts:

05-17-2011			10:56:42			Local7.Debug				10.0.0.1				May 16 22:54:51 dmm.example.com 
%DMS-1-ClusterNodeDownEvent: Cluster node dmm1.example.com is DOWN[DmmCluster] [ Original 
severity = severityCATASTROPHIC ]

05-17-2011	10:58:11	Local7.Debug	10.194.51.45	May 16 22:56:21 dmm1.example.com 
%DMS-1-ClusterNodeUpEvent: Cluster node dmm1.example.com is UP[DmmCluster] [ Original 
severity = severityINFO ]

http://www.cisco.com/en/US/docs/video/digital_media_systems/5_x/5_2/dmm/user/guide/admin/eventnotify.html

E-Mail Alerts

Figure 4-2 shows a typical event e-mail notification.

Figure 4-2 A Failover Node Outage Notification

The following information is set by e-mail:

Table 4-1 Event E-Mail Notification Fields
Field	Description
Alarm Type	•ClusterNodeDownEvent—The appliance failed or been taken offline. •ClusterNodeUpEvent—The appliance has come online and has entered the active or standby state.
Alarm Source	•DmmCluster—The alarm came from a Cisco DMM appliance. •VpCluster—The alarm came from a Cisco Show and Share appliance.
Cluster Virtual FQDN	The virtual FQDN of the appliance cluster.
Cluster Node FQDN:	The dedicated FQDN of the appliance.
Severity	•severityCATASTROPHIC—the appliance has experienced a failover event. •severityINFO—the message is an informational event (such as an UP message)
Comments:	The comment takes the form of: Cluster node dedicated_fqdn is status The status is one of the following values: •UNKNOWN—The appliance is transitioning between states. •UP—The appliance is up and in the active state. •DOWN—The appliance has failed. STANDBY—The appliance is up and in the standby state.

Monitoring Failover from Cisco DMM

The Cisco DMM home page displays a summary status of your failover cluster.

Click View Failover Status to go to the Administration > Failover > Status page.

The Failover Status screen provides the following information:

Table 4-2 Failover Status
Field	Description
Time of last event	The time (determined by the appliance time) of the last failover event.
Server Time	The time on the appliance.
Server status	For each server (Primary and Secondary), one of the following states: •Up/Active—The appliance is operating normally and is in the active state. •Up/Standby—The appliance is operating normally and is in the standby state. •Down—The appliance experienced a failover event and is currently in a failed state. Depending upon the failure, you may be able to access the appliance AAI interface. •Unknown—The appliance is transitioning between the UP and DOWN states.
Replication Status	The percentage complete the replication of information between the primary and secondary appliance. During initial activation, this value will be below 100% and the failover cluster is configured. During normal operation, this value should remain at 100%

What to Look For on This Page

The following conditions indicate abnormal operation and should be investigated:

•An appliance in the Down state. Use the Cluster Resource Status page to determine which resources have failed.

•An appliance in the Unknown state. This state indicates that the appliance is transitioning between UP and DOWN.

•One node down and and the message "No sync in progress". There can be several causes for this. The failover cluster may be in Split Brain mode (see Split Brain Recovery, page 5-4, for information on how to confirm and recover from split brain)

The active mode may have had a disk fail but not failed over. In this case, you can force a failover (see Forcing a Unit to Fail Over) and then proceed with the recovery procedure (see Recovering from a Failover, page 5-1).

Monitoring Failover from AAI

You can monitor the following using AAI:

•Replication Status

•Cluster Resource Status

Replication Status

The AAI replication status screen provides you with the same information that the Cisco DMM Administration > Failover > Failover Status page does. You can use this screen to track the progress of data replication.

Procedure

To access the Replication Status screen, do the following:

Step 1 Log into AAI.

Step 2 Choose FAIL_OVER > STATUS > REPLICATION.

Cluster Resource Status

The cluster resource status screen displays the status of the monitored components and services. When determining the cause of a failover, use this screen to check the status of the monitored services.

•Services with a status of "Started" are operating normally.

•Services with a status of "Stopped" have failed.

When a service is shown as "unmanaged" or "failed", the nodes should be restarted according to the following:

•UNMANAGED FAILED - Both nodes should be restarted, starting first with the node showing unmanaged, then the other.

•FAILED - The node on which resource is shown as Failed should be restarted.

The fail count for each service appears in the Migration summary section at the bottom of the screen:

Procedure

To access the Replication Status screen, do the following:

Step 1 Log into AAI.

Step 2 Choose FAIL_OVER > STATUS > CLUSTER_RESOURCE.

Step 3 Use the up and down arrow keys to scroll through the displayed information.

Forcing a Unit to Fail Over

To force a unit to failover, do the following:

Step 1 Log into the active appliance AAI interface. Use the virtual FQDN or IP address to ensure you are accessing the active appliance.

Step 2 Choose APPLIANCE_CONTROL > RESTART_OPTIONS > RESTART_WEB_SERVICES.

Restarting the web services on the active appliance triggers a failover to the secondary appliance. The appliance reboots to the standby state and uses the dedicated FQDN and IP address.

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)