The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This section contains the following topics:
•Monitoring Failover from Cisco DMM
Two additional alerts have been added to the Cisco DMM Administration > Alerts >Notification Rules page to support failover:
•Cluster node is deactivated—When configured, this alert is triggered whenever an appliance in a failover configuration goes offline.
•Cluster Node is activated—When configured, this alert is triggered whenever an appliance in a failover configuration comes online.
When an appliance in a failover configuration fails, you will receive a cluster node down notification.
If you reboot an appliance, you will receive a cluster down notification followed by a cluster node activated notification for that appliance as the appliances reboots into the standby state.
Figure 4-1 Failover Alerts
See Chapter 8: Events and Notifications in the User Guide for Cisco Digital Media Manager 5.2.x for information about enabling events, configuring your SNMP server, and populating your MIB browser.
For more information about each type of alert, see the following topics:
See Chapter 8: Events and Notifications in the User Guide for Cisco Digital Media Manager 5.2.x for information about enabling events, configuring your SNMP server, and populating your MIB browser.
The following traps pertain to appliance Up/Down events:
•.1.3.6.1.4.1.9.9.655.0.6—cluster node down
•.1.3.6.1.4.1.9.9.655.0.5—cluster node up
The following are sample UP/DOWN syslog alerts:
05-17-2011 10:56:42 Local7.Debug 10.0.0.1 May 16 22:54:51 dmm.example.com %DMS-1-ClusterNodeDownEvent: Cluster node dmm1.example.com is DOWN[DmmCluster] [ Original severity = severityCATASTROPHIC ]
05-17-2011 10:58:11 Local7.Debug 10.194.51.45 May 16 22:56:21 dmm1.example.com %DMS-1-ClusterNodeUpEvent: Cluster node dmm1.example.com is UP[DmmCluster] [ Original severity = severityINFO ]
See Chapter 8: Events and Notifications in the User Guide for Cisco Digital Media Manager 5.2.x for information about enabling events, configuring your SNMP server, and populating your MIB browser.
Figure 4-2 shows a typical event e-mail notification.
Figure 4-2 A Failover Node Outage Notification
The following information is set by e-mail:
The Cisco DMM home page displays a summary status of your failover cluster.
Click View Failover Status to go to the Administration > Failover > Status page.
The Failover Status screen provides the following information:
What to Look For on This Page
The following conditions indicate abnormal operation and should be investigated:
•An appliance in the Down state. Use the Cluster Resource Status page to determine which resources have failed.
•An appliance in the Unknown state. This state indicates that the appliance is transitioning between UP and DOWN.
•One node down and and the message "No sync in progress". There can be several causes for this. The failover cluster may be in Split Brain mode (see Split Brain Recovery, page 5-4, for information on how to confirm and recover from split brain)
The active mode may have had a disk fail but not failed over. In this case, you can force a failover (see Forcing a Unit to Fail Over) and then proceed with the recovery procedure (see Recovering from a Failover, page 5-1).
You can monitor the following using AAI:
The AAI replication status screen provides you with the same information that the Cisco DMM Administration > Failover > Failover Status page does. You can use this screen to track the progress of data replication.
Procedure
To access the Replication Status screen, do the following:
Step 1 Log into AAI.
Step 2 Choose FAIL_OVER > STATUS > REPLICATION.
The cluster resource status screen displays the status of the monitored components and services. When determining the cause of a failover, use this screen to check the status of the monitored services.
•Services with a status of "Started" are operating normally.
•Services with a status of "Stopped" have failed.
When a service is shown as "unmanaged" or "failed", the nodes should be restarted according to the following:
•UNMANAGED FAILED - Both nodes should be restarted, starting first with the node showing unmanaged, then the other.
•FAILED - The node on which resource is shown as Failed should be restarted.
The fail count for each service appears in the Migration summary section at the bottom of the screen:
Procedure
To access the Replication Status screen, do the following:
Step 1 Log into AAI.
Step 2 Choose FAIL_OVER > STATUS > CLUSTER_RESOURCE.
Step 3 Use the up and down arrow keys to scroll through the displayed information.
To force a unit to failover, do the following:
Step 1 Log into the active appliance AAI interface. Use the virtual FQDN or IP address to ensure you are accessing the active appliance.
Step 2 Choose APPLIANCE_CONTROL > RESTART_OPTIONS > RESTART_WEB_SERVICES.
Restarting the web services on the active appliance triggers a failover to the secondary appliance. The appliance reboots to the standby state and uses the dedicated FQDN and IP address.