The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This chapter contains these sections.
•Monitor Failover from Cisco DMM
Two alerts on the Cisco DMM Administration > Alerts >Notification Rules page support failover:
•Cluster node is deactivated—When configured, this alert is triggered whenever an appliance in a failover configuration goes offline.
•Cluster Node is activated—When configured, this alert is triggered whenever an appliance in a failover configuration comes online.
When an appliance in a failover configuration fails, you will receive a cluster node down notification.
When you reboot an appliance, you will receive a cluster down notification followed by a cluster node activated notification for that appliance as the appliances reboots into the standby state.
For information about enabling events, configuring your SNMP server, and populating your MIB browser, see the Events and Notifications chapter in User Guide for Cisco Digital Media Manager 5.4.x:
For more information about each type of alert, see the following topics:
For information about enabling events, configuring your SNMP server, and populating your MIB browser, see the Events and Notifications chapter in User Guide for Cisco Digital Media Manager 5.4.x:
The following traps pertain to appliance Up/Down events:
•.1.3.6.1.4.1.9.9.655.0.6—cluster node down
•.1.3.6.1.4.1.9.9.655.0.5—cluster node up
The following are sample UP/DOWN syslog alerts:
05-17-2011 10:56:42 Local7.Debug 10.0.0.1 May 16 22:54:51 dmm.example.com %DMS-1-ClusterNodeDownEvent: Cluster node dmm1.example.com is DOWN[DmmCluster] [ Original severity = severityCATASTROPHIC ]
05-17-2011 10:58:11 Local7.Debug 10.194.51.45 May 16 22:56:21 dmm1.example.com %DMS-1-ClusterNodeUpEvent: Cluster node dmm1.example.com is UP[DmmCluster] [ Original severity = severityINFO ]
For information about enabling events, configuring your SNMP server, and populating your MIB browser, see the Events and Notifications chapter in User Guide for Cisco Digital Media Manager 5.4.x:
Figure 3-1 shows a typical event e-mail notification.
Figure 3-1 A Failover Node Outage Notification
The following information is set by e-mail:
The Administration Dashboard in Cisco DMM shows a summary status of your failover cluster.
Click View Failover Status to go to the Administration > Failover > Failover Status page.
The Failover Status screen provides the following information:
What to Look For on This Page
The following conditions indicate abnormal operation and should be investigated:
•An appliance in the Down state. Use the Cluster Resource Status page to determine which resources have failed.
•An appliance in the Unknown state. This state indicates that the appliance is transitioning between UP and DOWN.
•One node down and and the message "No sync in progress." There can be several causes for this. The failover cluster may be in Split Brain mode (see Recover from a Split-Brain Condition, for information on how to confirm and recover from split brain)
The active mode may have had a disk fail but not failed over. In this case, you can force a failover (see Force a Unit to Fail Over) and then proceed with the recovery procedure (see Recover from a Failover).
You can monitor the following using AAI:
The AAI replication status screen provides you with the same information that the Cisco DMM Administration > Failover > Failover Status page does. You can use this screen to track the progress of data replication.
Procedure
To access the Replication Status screen, do the following:
Step 1 Log into AAI.
Step 2 Choose FAIL_OVER > STATUS > REPLICATION.
The cluster resource status screen displays the status of the monitored components and services. When determining the cause of a failover, use this screen to check the status of the monitored services.
•Services with a status of "Started" are operating normally.
•Services with a status of "Stopped" have failed.
When a service is shown as "unmanaged" or "failed", the nodes should be restarted according to the following:
•UNMANAGED FAILED - Both nodes should be restarted, starting first with the node showing unmanaged, then the other.
•FAILED - The node on which resource is shown as Failed should be restarted.
The fail count for each service appears in the Migration summary section at the bottom of the screen:
Procedure
To access the Replication Status screen, do the following:
Step 1 Log into AAI.
Step 2 Choose FAIL_OVER > STATUS > CLUSTER_RESOURCE.
Step 3 Use the up and down arrow keys to scroll through the displayed information.
To force a unit to fail over, do the following:
Step 1 Log into the active appliance AAI interface. Use the virtual FQDN or IP address to ensure you are accessing the active appliance.
Step 2 Choose APPLIANCE_CONTROL > RESTART_OPTIONS > RESTART_WEB_SERVICES.
Restarting the web services on the active appliance triggers a failover to the secondary appliance. The appliance reboots to the standby state and uses the dedicated FQDN and IP address.