Failover Configuration Guide for Cisco Digital Media Suite 5.4.x

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Book Contents

Find Matches in This Book

Results

Chapter: Monitor and Control Failover

Failover Alerts
Monitor Failover from Cisco DMM
Monitor Failover from AAI
- Replication Status
- Cluster Resource Status
Force a Unit to Fail Over

Monitor and Control Failover

Revised: January 8, 2014,

This chapter contains these sections.

•Failover Alerts

•Monitor Failover from Cisco DMM

•Monitor Failover from AAI

•Force a Unit to Fail Over

Failover Alerts

Two alerts on the Cisco DMM Administration > Alerts >Notification Rules page support failover:

•Cluster node is deactivated—When configured, this alert is triggered whenever an appliance in a failover configuration goes offline.

•Cluster Node is activated—When configured, this alert is triggered whenever an appliance in a failover configuration comes online.

When an appliance in a failover configuration fails, you will receive a cluster node down notification.

When you reboot an appliance, you will receive a cluster down notification followed by a cluster node activated notification for that appliance as the appliances reboots into the standby state.

For information about enabling events, configuring your SNMP server, and populating your MIB browser, see the Events and Notifications chapter in User Guide for Cisco Digital Media Manager 5.4.x:

http://cisco.com/en/US/docs/video/digital_media_systems/5_x/5_3/dmm/user/guide/admin/eventnotify.html

For more information about each type of alert, see the following topics:

•SNMP Alerts

•Syslog Alerts

•E-Mail Alerts

SNMP Alerts

For information about enabling events, configuring your SNMP server, and populating your MIB browser, see the Events and Notifications chapter in User Guide for Cisco Digital Media Manager 5.4.x:

http://cisco.com/en/US/docs/video/digital_media_systems/5_x/5_3/dmm/user/guide/admin/eventnotify.html

The following traps pertain to appliance Up/Down events:

•.1.3.6.1.4.1.9.9.655.0.6—cluster node down

•.1.3.6.1.4.1.9.9.655.0.5—cluster node up

Syslog Alerts

The following are sample UP/DOWN syslog alerts:

05-17-2011			10:56:42			Local7.Debug				10.0.0.1				May 16 22:54:51 dmm.example.com 
%DMS-1-ClusterNodeDownEvent: Cluster node dmm1.example.com is DOWN[DmmCluster] [ Original 
severity = severityCATASTROPHIC ]

05-17-2011	10:58:11	Local7.Debug	10.194.51.45	May 16 22:56:21 dmm1.example.com 
%DMS-1-ClusterNodeUpEvent: Cluster node dmm1.example.com is UP[DmmCluster] [ Original 
severity = severityINFO ]

For information about enabling events, configuring your SNMP server, and populating your MIB browser, see the Events and Notifications chapter in User Guide for Cisco Digital Media Manager 5.4.x:

http://cisco.com/en/US/docs/video/digital_media_systems/5_x/5_3/dmm/user/guide/admin/eventnotify.html

E-Mail Alerts

Figure 3-1 shows a typical event e-mail notification.

Figure 3-1 A Failover Node Outage Notification

The following information is set by e-mail:

Table 3-1 Event E-Mail Notification Fields
Field	Description
Alarm Type	•ClusterNodeDownEvent—The appliance failed or been taken offline. •ClusterNodeUpEvent—The appliance has come online and has entered the active or standby state.
Alarm Source	•DmmCluster—The alarm came from a Cisco DMM appliance.
Cluster Virtual FQDN	The virtual FQDN of the appliance cluster.
Cluster Node FQDN:	The dedicated FQDN of the appliance.
Severity	•severityCATASTROPHIC—the appliance has experienced a failover event. •severityINFO—the message is an informational event (such as an UP message)
Comments:	The comment takes the form of: Cluster node dedicated_fqdn is status The status is one of the following values: •UNKNOWN—The appliance is transitioning between states. •UP—The appliance is up and in the active state. •DOWN—The appliance has failed. •STANDBY—The appliance is up and in the standby state.

Monitor Failover from Cisco DMM

The Administration Dashboard in Cisco DMM shows a summary status of your failover cluster.

Click View Failover Status to go to the Administration > Failover > Failover Status page.

The Failover Status screen provides the following information:

Table 3-2 Failover Status
Field	Description
Time of last event	The time (determined by the appliance time) of the last failover event.
Server Time	The time on the appliance.
Server status	For each server (Primary and Secondary), one of the following states: •Up/Active—The appliance is operating normally and is in the active state. •Up/Standby—The appliance is operating normally and is in the standby state. •Down—The appliance experienced a failover event and is currently in a failed state. Depending upon the failure, you may be able to access the appliance AAI interface. •Unknown—The appliance is transitioning between the UP and DOWN states.
Replication Status	The percentage complete the replication of information between the primary and secondary appliance. During initial activation, this value will be below 100% and the failover cluster is configured. During normal operation, this value should remain at 100%

What to Look For on This Page

The following conditions indicate abnormal operation and should be investigated:

•An appliance in the Down state. Use the Cluster Resource Status page to determine which resources have failed.

•An appliance in the Unknown state. This state indicates that the appliance is transitioning between UP and DOWN.

•One node down and and the message "No sync in progress." There can be several causes for this. The failover cluster may be in Split Brain mode (see Recover from a Split-Brain Condition, for information on how to confirm and recover from split brain)

The active mode may have had a disk fail but not failed over. In this case, you can force a failover (see Force a Unit to Fail Over) and then proceed with the recovery procedure (see Recover from a Failover).

Monitor Failover from AAI

You can monitor the following using AAI:

•Replication Status

•Cluster Resource Status

Replication Status

The AAI replication status screen provides you with the same information that the Cisco DMM Administration > Failover > Failover Status page does. You can use this screen to track the progress of data replication.

Procedure

To access the Replication Status screen, do the following:

Step 1 Log into AAI.

Step 2 Choose FAIL_OVER > STATUS > REPLICATION.

Cluster Resource Status

The cluster resource status screen displays the status of the monitored components and services. When determining the cause of a failover, use this screen to check the status of the monitored services.

•Services with a status of "Started" are operating normally.

•Services with a status of "Stopped" have failed.

When a service is shown as "unmanaged" or "failed", the nodes should be restarted according to the following:

•UNMANAGED FAILED - Both nodes should be restarted, starting first with the node showing unmanaged, then the other.

•FAILED - The node on which resource is shown as Failed should be restarted.

The fail count for each service appears in the Migration summary section at the bottom of the screen:

Procedure

To access the Replication Status screen, do the following:

Step 1 Log into AAI.

Step 2 Choose FAIL_OVER > STATUS > CLUSTER_RESOURCE.

Step 3 Use the up and down arrow keys to scroll through the displayed information.

Force a Unit to Fail Over

To force a unit to fail over, do the following:

Step 1 Log into the active appliance AAI interface. Use the virtual FQDN or IP address to ensure you are accessing the active appliance.

Step 2 Choose APPLIANCE_CONTROL > RESTART_OPTIONS > RESTART_WEB_SERVICES.

Restarting the web services on the active appliance triggers a failover to the secondary appliance. The appliance reboots to the standby state and uses the dedicated FQDN and IP address.

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)