About Failover in the Public Cloud
To ensure redundancy, you can deploy the ASAv in a public cloud environment in an Active/Backup high availability (HA) configuration. HA in the public cloud implements a stateless Active/Backup solution that allows for a failure of the active ASAv to trigger an automatic failover of the system to the backup ASAv.
The following list describes the primary components in the HA public cloud solution:
-
Active ASAv—The ASAv in the HA pair that is set up to handle the firewall traffic for the HA peers.
-
Backup ASAv—The ASAv in the HA pair that is not handling firewall traffic and takes over as the active ASAv in the event of an active ASAv failure. It is referred to as a Backup rather than a Standby because it is does not take on the identify of its peer in the event of a failover.
-
HA Agent—A lightweight process that runs on the ASAv and determines the HA role (active/backup) of an ASAv, detects failures of its HA peer, and performs actions based on its HA role.
On the physical ASA and the non-public cloud virtual ASA, the system handles failover conditions using gratuitous ARP requests where the backup ASA sends out a gratuitous ARP indicating it is now associated with the active IP and MAC addresses. Most public cloud environments do not allow broadcast traffic of this nature. For this reason, an HA configuration in the public cloud requires ongoing connections be restarted when failover happens.
The health of the active unit is monitored by the backup unit to determine if specific failover conditions are met. If those conditions are met, failover occurs. The failover time can vary from a few seconds to over a minute depending on the responsiveness of the public cloud infrastructure.
About Active/Backup Failover
In Active/Backup failover, one unit is the active unit. It passes traffic. The backup unit does not actively pass traffic or exchange any configuration information with the active unit. Active/Backup failover lets you use a backup ASAv device to take over the functionality of a failed unit. When the active unit fails, it changes to the backup state while the backup unit changes to the active state.
Primary/Secondary Roles and Active/Backup Status
When setting up Active/Backup failover, you configure one unit to be primary and the other as secondary. At this point, the two units act as two separate devices for device and policy configuration, as well as for events, dashboards, reports, and health monitoring.
The main differences between the two units in a failover pair are related to which unit is active and which unit is backup, namely which unit actively passes traffic. Although both units are capable of passing traffic, only the primary unit responds to Load Balancer probes and programs any configured routes to use it as a route destination. The backup unit's primary function is to monitor the health of the primary unit. The primary unit always becomes the active unit if both units start up at the same time (and are of equal operational health).
Failover Connection
The backup ASAv monitors the health of the active ASAv using a failover connection established over TCP:
-
The active ASAv acts as a connection server by opening a listen port.
-
The backup ASAv connects to the active ASAv using connect port.
-
Typically the listen port and the connect port are the same, unless your configuration requires some type of network address translation between the ASAv units.
The state of the failover connection detects the failure of the active ASAv. When the backup ASAv sees the failover connection come down, it considers the active ASAv as failed. Similarly, if the backup ASAv does not receive a response to a keepalive message sent to the active unit, it considers the active ASAv as failed
Related Topics
Polling and Hello Messages
The backup ASAv sends Hello messages over the failover connection to the active ASAv and expects a Hello Response in return. Message timing uses a polling interval, the time period between the receipt of a Hello Response by the backup ASAv unit and the sending of the next Hello message. The receipt of the response is enforced by a receive timeout, called the hold time. If the receipt of the Hello Response times out, the active ASAv is considered to have failed.
The polling and hold time intervals are configurable parameters; see Configure Failover Criteria and Other Settings.
Active Unit Determination at Startup
The active unit is determined by the following:
-
If a unit boots and detects a peer already running as active, it becomes the backup unit.
-
If a unit boots and does not detect a peer, it becomes the active unit.
-
If both units boot simultaneously, then the primary unit becomes the active unit, and the secondary unit becomes the backup unit.
Failover Events
In Active/Backup failover, failover occurs on a unit basis. The following table shows the failover action for each failure event. For each failure event, the table shows the failover policy (failover or no failover), the action taken by the active unit, the action taken by the backup unit, and any special notes about the failover condition and actions.
Failure Event |
Policy |
Active Action |
Backup Action |
Notes |
---|---|---|---|---|
Backup unit sees a failover connection close |
Failover |
n/a |
Become active Mark active as failed |
This is the standard failover use case. |
Active unit sees a failover connection close |
No failover |
Mark backup as failed |
n/a |
Failover to an inactive unit should never occur. |
Active unit sees a TCP timeout on failover link |
No failover |
Mark backup as failed |
No action |
Failover should not occur if the active unit is not getting a reponse from the backup unit. |
Backup unit sees a TCP timeout on failover link |
Failover |
n/a |
Become active Mark active as failed Try to send failover command to active unit |
The backup unit assumes that the active unit is unable to continue operation and takes over. In case the active unit is still up, but fails to send a response in time, the backup unit sends the failover command to the active unit. |
Active Authentication failed |
No failover |
No action |
No action |
Because the backup unit is changing the route tables, it is the only unit that needs to be authenticated to Azure. It does not matter if the active unit is authenticated to Azure or not. |
Backup Authentication failed |
No failover |
Mark backup as unauthenticated |
No action |
Failover cannot happen if the backup unit is not authenticated to Azure. |
Active unit initiates intentional failover |
Failover |
Become backup |
Become active |
The active unit initiates failover by closing the Failover Link Connection. The backup unit sees the connection close and becomes the active unit. |
Backup unit initiates intentional failover |
Failover |
Become backup |
Become active |
The backup unit initiates failover by sending a failover message to the active unit. When the active unit sees the message, it closes the connection and becomes the backup unit. The backup unit sees the connection close and becomes the active unit. |
Formerly active unit recovers |
No failover |
Become backup |
Mark mate as backup |
Failover should not occur unless absolutely necessary. |
Active unit sees failover message from backup unit |
Failover |
Become backup |
Become active |
Can occur if a manual failover was initiated by a user; or the backup unit saw the TCP timeout, but the active unit is able to receive messages from the backup unit. |
Guidelines and Limitations
This section includes the guidelines and limitations for this feature.
ASAv Failover for High Availability in the Public Cloud
To ensure redundancy, you can deploy the ASAv in a public cloud environment in an Active/Backup high availability (HA) configuration.
-
Supported only on the Microsoft Azure public cloud; when configuring the ASAv VM, the maximum supported number of vCPUs is 8; and the maximum supported memory is 64GB RAM. See the ASAv Getting Started Guide for comprehensive list of supported instances.
-
Implements a stateless Active/Backup solution that allows for a failure of the active ASAv to trigger an automatic failover of the system to the backup ASAv.
Limitations
-
Failover is on the order of seconds rather than milliseconds.
-
The HA role determination and the ability to participate as an HA unit depends on TCP connectivity between HA peers and between an HA unit and the Azure infrastructure. There are several situations where an ASAv will not be able participate as an HA unit:
-
The inability to establish a failover connection to its HA peer.
-
The inability to retrieve an authentication token from Azure.
-
The inability to authenticate with Azure.
-
-
There is no synching of the configuration from the Active unit to the Backup unit. Each unit must be configured individually with similar configurations for handling failover traffic.
-
Failover route-table limitations
With respect to route-tables for HA in the public cloud:
-
You can configure a maximum of 16 route-tables.
-
Within a route-table, you can configure a maximum of 64 routes.
In each case the system alerts you when you have reached the limit, with the recommendation to remove a route-table or route and retry.
-
-
No ASDM support.
-
No IPSec Remote Access VPN support.
Note
See the Cisco Adaptive Security Virtual Appliance (ASAv) Quick Start Guide for information about supported VPN topologies in the public cloud.
-
ASAv VM instances must be in the same availability set. If you are a current ASAv user in Azure, you will not be able to upgrade to HA from an existing deployment. You have to delete your instance and deploy the ASAv 4 NIC HA offering from the Azure Marketplace.