About Failover


Revised: May 31, 2011

This chapter describes the Cisco Digital Media Suite (DMS) failover configuration, which allows you to configure two Cisco DMS appliances so that one will take over operation if the other one fails.

This chapter includes the following sections:

Overview

Limitations and Restrictions

Important Notes for Failover Configuration

What to Do Next

Overview

You can configure Cisco DMS appliances in a stateless, active/standby failover configuration. The failover configuration requires two identical Cisco DMS appliances connected to each other through a dedicated failover link. The health of the active unit is monitored to determine if specific failover conditions are met. When those conditions are met, failover occurs.

This section contains the following topics:

Cisco DMS Failover Terminology

Supported Failover Configurations

Failover Triggers

The Failover Process

Cisco DMS Failover Terminology

The following terms are used throughout this document for describing failover configurations:

Active appliance—The appliance that is currently responding to user requests. Always access the active appliances using the virtual IP address and virtual FQDN.

Application interface—the interface on a Cisco DMM or Cisco Show and Share appliance that users connect to. Health monitoring also occurs through this interface.

Dedicated FQDN—an FQDN that is assigned to the appliance. This FQDN remains with the appliance during a failover. The appliance is reachable through this FQDN, but it should only be used if you are trying to access the AAI interface of the standby appliance (you cannot access the GUI of an appliance in the standby state).

Users should never use the dedicated FQDN to access Cisco DMM or Cisco Show and Share GUI on the active appliance; they should use the Virtual FQDN to access the active appliance GUI.

Dedicated IP address—an IP address that is assigned to the appliance. This IP address remains with the appliance during a failover.

Primary appliance—the appliance in a failover pair that is initially put into the active state and is the source of data during the initial configuration. When adding failover to an existing Cisco DMS installation, the existing Cisco DMS appliances are the primary appliances. The virtual IP address and virtual FQDN are obtained from the primary appliances.

Replication interface—the interface that connects two appliances in a failover pair together. Health monitoring and data replication happen through this interface. You cannot access the Cisco DMM or Cisco Show and Share GUI through the replication interface.

Secondary appliance—the appliance that is initially put into the standby state. When adding failover to an existing Cisco DMS installation, the secondary appliances are the ones you add to the existing configuration.

Standby appliance—The appliance that is not actively responding to user requests. The standby appliance monitors the active appliance health for failover triggers. During a failover, the standby appliance becomes active and takes over the virtual IP address and FQDN.

Virtual FQDN—the FQDN used by the active appliance, no matter which physical appliance is the active appliance. Users and administrators should always use the virtual FQDN to access the Cisco DMM or Cisco Show and Share appliance interface.

Virtual IP address—the IP address used by the active appliance, no matter which physical appliance is the active appliance. If the active appliance fails, the virtual IP address is used by the standby appliance as it becomes active.

Supported Failover Configurations

Failover is supported for Cisco Digital Signs and Cisco Show and Share implementations. See the following topics for more information about each type of failover support.

Cisco Digital Signs

Cisco Show and Share

Cisco Digital Signs

A Cisco Digital Signs implementation requires that the primary Cisco DMM appliance is paired with a secondary Cisco DMM appliance that acts as a standby appliance. The application interfaces (GigabitEthernet 1) of the appliances must be on the same subnet. The two appliances are connected by either a crossover cable (see Figure 1-1) or a switch (Figure 1-2) on their GigabitEthernet 2 interfaces. This connection is used to monitor failover health and replicate data between them.

Figure 1-1

Digital Signs Failover with a Crossover Cable

Figure 1-2

Digital Signs Failover with a Switch

If you have a Cisco DMS installation with a Cisco Show and Share appliance, you must use the Cisco Show and Share failover configuration, even if you only want failover support for your Cisco Digital Signs implementation (see Cisco Show and Share).

For detailed information on how to configure Cisco Digital Signs failover, see Cisco Digital Signs Failover Configuration, page 2-1.

Cisco Show and Share

A Cisco Show and Share failover configuration requires the following devices:

A primary and a secondary Cisco DMM appliance. The application interfaces (GE 1) must be on the same subnet. The appliances must be connected together by a crossover cable or a switch on their replication interfaces (GE 2). The application interfaces must be on a different subnet from the replication interfaces.

A primary and a secondary Cisco Show and Share appliance. The application interfaces (GE 1) must be on the same subnet. However, the application interfaces can be on a different subnet from the Cisco DMM appliance. The appliances must be connected together by a crossover cable or a switch on their replication interfaces (GE 2). The application interfaces must be on a different subnet from the replication interfaces.


Note You cannot configure failover for only the Cisco Show and Share appliance; you must configure it for both the Cisco Show and Share and the Cisco DMM appliances.


Figure 1-3 shows an example Cisco Show and Share failover configuration.

Figure 1-3

A Cisco Show and Share Failover Configuration

For detailed information on how to configure Cisco Show and Share failover, see Cisco Show and Share Failover Configuration, page 3-1.

Failover Triggers

The following events trigger failover:

The standby device fails to receive 10 heartbeat messages from the active device.

Heartbeat messages are sent once a second. Missing 10 consecutive heartbeats causes a failover.

Manually restarting the following services using the AAI interface:

Web services (Tomcat)

Database services

Rebooting the active appliance.

Loss of power (either because you powered the appliance off or there was a general power failure)

Pairing the active appliances.

Restoring a backup on the active appliance.

Changing the logging level.

Re-generating a certificate.

Reaching the fail count threshold (5) for a monitored service running on the active appliance. When a service stops, the appliance automatically attempts to restart it. Each time the service fails, a fail counter increments. When the fail counter for any of the services reaches 5, failover is triggered. To clear the counters, you need to reboot the appliance. See Minor Failure Event Recovery, page 5-1 for more information.

A single disk failure on the active unit does not cause a failover. To fail over, you must force failover by rebooting the active appliance. A multiple-disk failure on the active will cause failover. See Major Failure Event Recovery, page 5-2 for more information about recovering from a disk failure.

The Failover Process

The following events happen during failover:

1. A failover event occurs. This causes the active appliance to go into a down or unknown state, depending upon the type of failure. A "down" notification is sent.

2. The standby appliance becomes the active starts using the virtual FQDN and IP address.

3. The new active appliance restarts the application services. This can take up to 3 minutes for a Cisco Show and Share appliance. An "up" notification is sent.

4. When the failed appliance is brought back online, it becomes the standby unit and begins emitting heartbeat requests.

Failover is stateless. Therefore, any users with active sessions to the appliance will need to reconnect and, if they were logged in, log in again.

If users were viewing a Cisco Show and Share video that was hosted on an external server, the video will continue to play until the user attempts to navigate the application. If users were viewing a video that was streaming from Cisco Show and Share, the video will stop playing.

If users are uploading or publishing a video when a failover occurs, the process will fail and they will need to re-upload or re-publish their video.

After a failover, users will need to wait approximately 3 minutes before they can log back into the web interface.

Limitations and Restrictions

The application interface of each pair of appliances must be on the same subnet (although the Cisco DMM pair and the Cisco Show and Share pair are not required to be on the same subnet).

The replication interface of each appliance pair must be on the same subnet. However, they cannot be on the same subnet as the application interface.

You must install the base license on the secondary pair of appliances before you can configure failover.

Failover activation and replication can take up to 15 hours.

During the activation phase (which takes up to 20 minutes), the Cisco DMM and Cisco Show and Share applications are not available to end users.

During replication phase, users can view and upload videos to Cisco Show and Share, but performance may be degraded.

Do not make any configuration or administrative changes or restart services during activation and replication.

You cannot have a Cisco Show and Share appliance-only failover configuration.

You cannot access the GUI of a standby appliance. You can access the AAI interface of a standby appliance by using the dedicated IP address or dedicated FQDN. Do not make any configuration changes to the standby appliance.

Backups taken from a standalone mode set of appliances cannot be restored on a failover cluster. However, backups taken from an active device in a failover cluster can be restored on the appliance when it is converted to standalone mode.

Important Notes for Failover Configuration

Install external certificates on the primary pair of appliances before configuring failover. When the certificates expire, use the virtual FQDN when obtaining new certificates. Install the new certificates using the virtual FQDN to access the AAI interface.

Back up your failover cluster (using the virtual FQDN to access AAI) immediately after configuring failover. Backups taken in standalone mode cannot be restored on a failover cluster.

When using a switched interface for the replication interface connection, you need to make sure that the latency between the active and standby device is no more than 10 seconds. Latency of greater than 10 seconds will cause 10 consecutive heartbeat messages to be missed, initiating a failover.

Restoring data on a Cisco Show and Share appliance in a failover cluster causes the Cisco Show and Share to reboot, initiating failover. This is expected behavior. The data is written to the standby appliance during the restore, so when the standby appliance becomes active it will contain the correct data.

In a switched configuration, the switch interfaces connected to the replication interfaces must be configured for 1000 Mbps.

What to Do Next

To configure failover for a Cisco Digital Signs implementation, see Cisco Digital Signs Failover Configuration, page 2-1

To configure failover for a Cisco Show and Share implementation, see Cisco Show and Share Failover Configuration, page 3-1

To configure alerts and monitor your appliances, see Monitoring and Controlling Failover, page 4-1.

To recover from a failover event, see Recovering from a Failover, page 5-1.