High Availability and Disaster Recovery Certification Test Plan
Manual Switches
- IntraCluster Switch
- InterCluster Switch
Ethernet/Network Failures
- Network Communication Failures
- Cluster Communication Failure
Server Failures
Application Failures
- Application Failure, Single Cluster
- Application Failure, Dual Cluster

High Availability and Disaster Recovery Certification Test Plan

The HA/DR certification test plan validates that the Security Manager application is highly available and can survive various hardware and software failures. The test plan also covers maintenance activities, such as manually switching the application between servers.

Note

Security Manager client sessions require active users to log in again after an application failover. This behavior is equivalent to stopping and starting Security Manager services running on the server.

The following test case categories are contained in this appendix:

Manual Switches

This section covers two different types of manual switches. In a single cluster with two servers, you can switch between the two servers in the cluster (intracluster switch); in a dual cluster configuration with a single server in each cluster, you can switch between clusters (intercluster switch).

This section contains the following topics:

IntraCluster Switch
InterCluster Switch

IntraCluster Switch

Test Case Title: Manual application switch within a cluster.

Description: The application is manually switched to a different server in the same cluster using VCS.

Test Setup: A dual node cluster (Figure) in a single cluster configuration.

Procedure

Step 1

Ensure that the APP service group is running on the primary server. Using the VCS Cluster Explorer, select the APP service group. From the shortcut menu, select Switch To, and choose the secondary server. Alternatively, issue the following command:

Example:


C:\> hagrp -switch APP -to secondary_server_name

Step 2

From the Resource view of the APP service group, observe that the resources in the service group go offline on the primary server and then come online on the secondary server. Or issue the following command to observe the status of the APP service group.

Example:


C:\> hagrp -state APP

Step 3

From a client machine, launch the Security Manager client, using the virtual hostname or IP address in the Server Name field of the login dialog box. Verify that you can log in to the application successfully.

InterCluster Switch

Test Case Title: Manual application switch between clusters.

Description: The application is manually switched to a server in a different cluster using VCS.

Test Setup: A dual cluster configuration as shown in Figure with a single server in each cluster.

Procedure

Step 1	Using the VCS Cluster Explorer, select the APP service group. From the shortcut menu, select Switch To, then Remote Switch(...), to open the Switch global dialog box. In the dialog box, specify the remote cluster and, if desired, a specific server in the remote cluster. Alternatively, issue the following command: Example: `C:\> hagrp -switch APP -any -clus secondary_cluster_name`
Step 2	From the Resource view of the APP service group, observe that the resources in the service group go offline in the primary cluster. Select the root cluster node in the tree and use the Remote Cluster Status view to see that the APP service group goes online on the remote cluster. Or issue the following command to observe the status of the APP service group. Example: `C:\> hagrp -state APP #Group Attribute System Value APP State csm_primary:<Primary Server> \|OFFLINE\| APP State localclus:<Secondary Server> \|ONLINE\|`
Step 3	From a client machine, launch the Security Manager client by entering the appropriate hostname or application IP address used in the secondary cluster in the Server Name field of the Login dialog box. Verify that you can successfully log in to the application.
Step 4	Log out of the Security Manager client, and then switch the APP service group to the primary cluster using either the VCS Cluster Explorer or the following command: Example: `C:\> hagrp -switch APP -any -clus primary_cluster_name`

Ethernet/Network Failures

HA/DR configurations have two types of server Ethernet connections. The first are the Ethernet connections used for network communications (public interfaces); the second are Ethernet interfaces dedicated for intracluster communications (private interfaces). This section covers failure test cases for each type of Ethernet interface.

Network Communication Failures
Cluster Communication Failure

Network Communication Failures

This section describes the tests used to verify that VCS can detect failure of the network Ethernet ports used for network communications. This section contains the following topics:

Network Ethernet Failure on Secondary Server, Single Cluster
Network Ethernet Failure on Primary Server, Single Cluster
Network Ethernet Failure on Secondary Server, Dual Cluster
Network Ethernet Failure on Primary Server, Dual Cluster

Network Ethernet Failure on Secondary Server, Single Cluster

Test Case Title: A failure occurs in the network Ethernet connection on the secondary server in a single cluster configuration.

Description: This test case verifies that VCS can detect a failure on the network Ethernet port on the secondary server and then recover after the failure is repaired.

Test Setup: A dual node cluster (Figure) in a single cluster configuration with a single network connection per server.

Procedure

Step 1	Verify that the application is running on the primary server.
Step 2	Log in to the application from a client machine.
Step 3	Remove the Ethernet cable from the network port on the secondary server to isolate the server from communicating with the switch/router network. Wait for at least 60 seconds for VCS to detect the network port failure. Verify that VCS detects a failure of the NIC resource on the secondary server by running the following command: Example: `C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N ONLINE B APP <SecondaryServer> Y N OFFLINE\|FAULTED -- RESOURCES FAILED -- Group Type Resource System C APP NIC NIC <SecondaryServer>`
Step 4	Restore the Ethernet cable to the network port on the secondary server. Verify that VCS detects that the failure was cleared by running the following command: Example: `C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N ONLINE B APP <SecondaryServer> Y N OFFLINE`

Network Ethernet Failure on Primary Server, Single Cluster

Test Case Title: A failure occurs in the network Ethernet connection on the primary server in a single cluster configuration.

Description: This test case verifies that VCS can detect a failure on the network Ethernet port of the primary server and automatically switch the application to the secondary server. After the problem is fixed, you can switch the application back to the primary server manually.

Test Setup: A dual node cluster (Ethernet and Storage Connections for a Dual-Node Site) with a single network connection per server.

Procedure

Step 1	Verify that the application is running on the primary server.
Step 2	Remove the Ethernet cable from the network port on the primary server to isolate the server from communicating with the switch/router network. Verify that VCS detects a failure of the NIC resource and automatically switches the APP service group to the secondary server: Example: `C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N OFFLINE\|FAULTED B APP <SecondaryServer> Y N ONLINE -- RESOURCES FAILED -- Group Type Resource System C APP NIC NIC <PrimaryServer> C APP IP APP_IP <PrimaryServer>`
Step 3	Verify that you can log in to the application while it is running on the secondary server.
Step 4	Replace the Ethernet cable on the network port of the primary server and manually clear the faulted IP resource on the primary server: Example: `C:\> hares -clear APP_IP -sys primary_server_name`
Step 5	Manually switch the APP service group back to the primary server. Example: `C:\> hagrp -switch APP -to primary_server_name`

Network Ethernet Failure on Secondary Server, Dual Cluster

Test Case Title: A failure occurs in the network Ethernet connection on the secondary server in a dual cluster configuration.

Description: This test case verifies that VCS can detect a failure on the network Ethernet port and then recover after the failure is repaired.

Test Setup: A dual cluster configuration (Figure) with a single node in each cluster and a single Ethernet network connection for each server.

Procedure

Step 1	Verify that the APP service group is running on the primary cluster/server.
Step 2	Log in to the Security Manager from a client machine.
Step 3	Remove the Ethernet cable from the network port on the server in the secondary cluster. This isolates the server from communicating with the switch/router network and interrupts replication. From the primary server, verify that replication was interrupted (disconnected) by running the following command: Example: `C:\> vxprint -Pl Diskgroup = datadg Rlink : rlk_172_6037 info : timeout=500 packet_size=1400 latency_high_mark=10000 latency_low_mark=9950 bandwidth_limit=none state : state=ACTIVE synchronous=off latencyprot=off srlprot=off assoc : rvg=CSM_RVG remote_host=172.25.84.34 remote_dg=datadg remote_rlink=rlk_172_32481 local_host=172.25.84.33 protocol : UDP/IP flags : write attached consistent disconnected`
Step 4	Run the following command from the primary server to verify that communication with the secondary cluster was lost: Example: C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N ONLINE B APPrep <PrimaryServer> Y N ONLINE B ClusterService <PrimaryServer> Y N ONLINE -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_secondary ALIVE -- REMOTE CLUSTER STATE -- Cluster State M csm_secondary LOST_CONN -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_secondary:<SecondaryServer> RUNNING 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_secondary:<SecondaryServer> Y N OFFLINE
Step 5	Reattach the network Ethernet cable to the secondary server and verify that replication resumed. Example: `C:\> vxprint -Pl Diskgroup = datadg Rlink : rlk_172_6037 info : timeout=29 packet_size=1400 latency_high_mark=10000 latency_low_mark=9950 bandwidth_limit=none state : state=ACTIVE synchronous=off latencyprot=off srlprot=off assoc : rvg=CSM_RVG remote_host=172.25.84.34 remote_dg=datadg remote_rlink=rlk_172_32481 local_host=172.25.84.33 protocol : UDP/IP flags : write attached consistent connected`
Step 6	Verify that communications to the secondary cluster has been restored. Example: C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N ONLINE B APPrep <PrimaryServer> Y N ONLINE B ClusterService <PrimaryServer> Y N ONLINE -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_secondary ALIVE -- REMOTE CLUSTER STATE -- Cluster State M csm_secondary RUNNING -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_secondary:<SecondaryServer> RUNNING 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_secondary:<SecondaryServer> Y N OFFLINE
Step 7	If replication has not recovered you may need to manually clear the IP resource if it has faulted and then start the APPrep service group on the secondary as follows: Example: `C:\> hares -clear APP_IP C:\> hagrp -online APPrep -sys secondary_server_name`

Network Ethernet Failure on Primary Server, Dual Cluster

Test Case Title: A failure occurs in the network Ethernet connection on the primary server.

Description: This test case verifies that VCS can detect a failure on the primary server network Ethernet port and can recover by starting the application on the secondary server. After the Ethernet connection is restored, you can manually fail over back to the original primary server, retaining any data changes that were made while running on the secondary.

Test Setup: A dual cluster configuration (Figure) with a single node in each cluster.

Procedure

Step 1	Verify that the APP service group is running on the primary cluster.
Step 2	Remove the network Ethernet cable from the port on the server in the primary cluster to isolate the server from communicating with the switch/router network. VCS should detect this as a failure of the IP and NIC resources. Verify that VCS detected the failure and brought down the APP service group. Example: C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N OFFLINE B APPrep <PrimaryServer> Y N OFFLINE\|FAULTED B ClusterService <PrimaryServer> Y N ONLINE -- RESOURCES FAILED -- Group Type Resource System C APPrep IP APP_IP <PrimaryServer> C APPrep NIC NIC <PrimaryServer> -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_secondary DOWN -- REMOTE CLUSTER STATE -- Cluster State M csm_secondary FAULTED -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_secondary:<SecondaryServer> FAULTED 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_secondary:<SecondaryServer> Y N OFFLINE
Step 3	Start the APP service group on the secondary cluster using the following command on the secondary server: Example: `C:\> hagrp -online -force APP -sys secondary_server_name`
Step 4	From your client machine, log in to Security Manager to verify that it is operational. Change some data so that you can verify that changes are retained when you switch back to the primary server.
Step 5	Reconnect the network Ethernet cable to the primary cluster server.
Step 6	Clear any faults on the IP resource and turn on the APPrep service from the primary server: Example: `C:\> hares -clear APP_IP C:\> hagrp -online APPrep -sys primary_server_name`
Step 7	Convert the original primary RVG to secondary and synchronize the data volumes in the original primary RVG with the data volumes on the new primary RVG using the fast failback feature. Using the Cluster Explorer for the secondary cluster, right-click the RVGPrimary resource (APP_RVGPrimary), select actions, then select fbsync from the Actions dialog box, and then click OK. Alternatively, you can issue the following command: Example: `C:\> hares -action APP_RVGPrimary fbsync 0 -sys secondary_server_name`
Step 8	Using the VCS Cluster Explorer on the secondary cluster, select the APP service group. From the short-cut menu, select Switch To, then Remote Switch(...), to open the Switch global dialog box. In the dialog box, specify the primary cluster and the primary server. Alternatively, issue the following command: Example: `C:\> hagrp -switch APP -any -clus primarycluster`
Step 9	Log in to the application to verify that the changes you made on the secondary server were retained.

Cluster Communication Failure

Test Case Title: Failures occur in the Ethernet used for cluster communication.

Description: The dedicated Ethernet connections used between servers in the cluster for intracluster communication fail. The test verifies that the cluster communications continue to function when up to two of the three redundant communication paths are lost.

Test Setup: A dual-node cluster (Figure) in a single cluster configuration, with two dedicated cluster communication Ethernet connections and a low-priority cluster communication connection configured on the network Ethernet connection.

Note

In addition to the commands given in this test case, you can monitor the status of the cluster communications from the Cluster Explorer by selecting the root node in the tree and selecting the System Connectivity tab.

Procedure

Step 1

Issue the following command to verify that all systems are communicating through GAB.

Note

Group Membership Services/Atomic Broadcast (GAB) is a VCS protocol responsible for cluster membership and cluster communications.

Example:


# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   e8cc02 membership 01
Port h gen   e8cc01 membership 01

Step 2

Remove the Ethernet cable from the first dedicated Ethernet port used for cluster communication on the primary server.

Step 3

Issue the following command to view the detailed status of the links used for cluster communication and verify that the first dedicated cluster communication port is down.

Note

The asterisk (*) in the output indicates the server on which the command is run. The server where the command is run always shows its links up, even if one or more of those ports are the ones that are physically disconnected.

Example:


# lltstat -nvv
LLT node information:
    Node           State    Link  Status  Address
   * 0 <PrimaryServer>   OPEN
                                  Adapter0   UP    00:14:5E:28:52:9C
                                  Adapter1   UP    00:14:5E:28:52:9D
                                  Adapter2   UP    00:0E:0C:9C:20:FE
     1 <SecondaryServer> OPEN
                                  Adapter0   DOWN
                                  Adapter1   UP    00:14:5E:28:27:17
                                  Adapter2   UP    00:0E:0C:9C:21:C2
...

Step 4

If you configured a low-priority heartbeat link on the network interface, remove the Ethernet cable from the second dedicated Ethernet port used for cluster communication on the primary server.

Step 5

Issue the following command to verify that all systems are communicating through GAB. Also confirm that both servers in the cluster are now in a Jeopardy state, since each server has only one heartbeat working.

Example:


# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   e8cc02 membership 01
Port a gen   e8cc02   jeopardy ;1
Port h gen   e8cc01 membership 01
Port h gen   e8cc01   jeopardy ;1

Step 6

Issue the following command to view the detailed status of the links used for cluster communication and verify that the second dedicated Ethernet port for cluster communications on the primary server is down.

Example:


# lltstat -nvv
LLT node information:
    Node           State    Link  Status  Address
   * 0 <PrimaryServer>   OPEN
                                  Adapter0   UP    00:14:5E:28:52:9C
                                  Adapter1   UP    00:14:5E:28:52:9D
                                  Adapter2   UP    00:0E:0C:9C:20:FE
     1 <SecondaryServer> OPEN
                                  Adapter0   DOWN
                                  Adapter1   UP    00:14:5E:28:27:17
                                  Adapter2   DOWN

Step 7

Replace the Ethernet cable on the second dedicated Ethernet port for cluster communications on the primary server.

Step 8

Verify that the Jeopardy condition was removed by issuing the following command:

Example:


# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   e8cc02 membership 01
Port h gen   e8cc01 membership 01

Step 9

Replace the Ethernet cable on the first dedicated Ethernet port for cluster communications on the primary server.

Server Failures

This section covers causing server failures by removing the power from the server to cause a failure. Four cases are covered:

Standby Server Failure, Single Cluster
Primary Server Failure, Single Cluster
Standby Server Failure, Dual Cluster
Primary Server Failure, Dual Cluster

Standby Server Failure, Single Cluster

Test Case Title: The standby server in a single cluster configuration fails.

Description: This test case verifies that the application running in the primary server is unaffected and that after the standby server is repaired, the application can successfully rejoin the cluster configuration.

Test Setup: A dual node cluster (Ethernet and Storage Connections for a Dual-Node Site) with two dedicated cluster communication Ethernet connections and a low-priority cluster communication connection on the network Ethernet connection.

Procedure

Step 1

Verify that the application is running on the primary server in the cluster.

Example:


C:\> hastatus -sum
-- SYSTEM STATE
-- System               State                Frozen              
A  <PrimaryServer>      RUNNING              0                    
A  <SecondaryServer>    RUNNING              0                    
-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State          
B  APP             <PrimaryServer>      Y          N               ONLINE         
B  APP             <SecondaryServer>    Y          N               OFFLINE

Step 2

Remove the power for the secondary server and verify that VCS detected the failure and that the application continues to operate on the primary server.

Example:


C:\> hastatus -sum
-- SYSTEM STATE
-- System               State                Frozen              
A  <PrimaryServer>      RUNNING              0                    
A  <SecondaryServer>    FAULTED              0                    
-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State          
B  APP             <PrimaryServer>      Y          N               ONLINE

Step 3

Reapply power and boot the secondary server. After the server recovers, verify that it rejoined the cluster in a healthy state by running the following command. The output should be identical to the output in Step 1.

Example:


C:\> hastatus -sum

Primary Server Failure, Single Cluster

Test Case Title: The primary server in a single cluster fails.

Description: This test case verifies that if a primary server fails, the application starts running on the secondary server and that after the primary server is restored, the application can be reestablished on the primary server.

Test Setup: A dual node cluster (Figure).

Procedure

Step 1	Verify that the APP service group is running on the primary server in the cluster by examining the output of the following command: Example: `C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N ONLINE B APP <SecondaryServer> Y N OFFLINE`
Step 2	Remove the power from the primary server and verify that VCS detected the failure and that the APP service group automatically moved to the secondary server. Example: `C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> FAULTED 0 A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <SecondaryServer> Y N ONLINE`
Step 3	Verify that you can successfully log in to Security Manager from a client machine.
Step 4	Restore the power to the primary server and verify that the server can rejoin the cluster in a healthy condition. Run the following command. The output should be identical to the output in Step 1. Example: `C:\> hastatus -sum`
Step 5	Manually switch the APP service group back to the primary server. Example: `C:\> hagrp -switch APP -to primary_server_name`

Standby Server Failure, Dual Cluster

Test Case Title: The standby server in a dual cluster configuration fails.

Description: This test case verifies that an application running in the primary cluster is unaffected by a standby server failure and that after the standby server is repaired, the application can successfully rejoin the dual cluster configuration.

Test Setup: A dual cluster configuration, with replication (Figure), with a single node in each cluster.

Procedure

Step 1	Verify that the APP and ClusterService service groups are running in the primary cluster by running the following command on the primary server: Example: C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N ONLINE B APPrep <PrimaryServer> Y N ONLINE B ClusterService <PrimaryServer> Y N ONLINE -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_secondary ALIVE -- REMOTE CLUSTER STATE -- Cluster State M csm_secondary RUNNING -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_secondary:<SecondaryServer> RUNNING 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_secondary:<SecondaryServer> Y N OFFLINE
Step 2	Remove the power from the secondary server and verify that the primary cluster detects a loss of communication to the secondary cluster: Example: C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N ONLINE B APPrep <PrimaryServer> Y N ONLINE B ClusterService <PrimaryServer> Y N ONLINE -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_secondary ALIVE -- REMOTE CLUSTER STATE -- Cluster State M csm_secondary LOST_CONN -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_secondary:<SecondaryServer> RUNNING 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_secondary:<SecondaryServer> Y N OFFLINE
Step 3	Restore the power to the secondary server. After the server restarts, verify that the primary cluster reestablished communications with the secondary cluster by running the following command. The output should be identical to the output in Step 1. Example: `C:\> hastatus -sum`
Step 4	Verify that the replication is operational and consistent by running the following command: Example: `C:\> vxprint -Pl Diskgroup = BasicGroup Diskgroup = datadg Rlink : rlk_172_6037 info : timeout=16 packet_size=1400 latency_high_mark=10000 latency_low_mark=9950 bandwidth_limit=none state : state=ACTIVE synchronous=off latencyprot=off srlprot=off assoc : rvg=CSM_RVG remote_host=172.25.84.34 remote_dg=datadg remote_rlink=rlk_172_32481 local_host=172.25.84.33 protocol : UDP/IP flags : write attached consistent connected`

Primary Server Failure, Dual Cluster

Test Case Title: The primary server in a dual cluster configuration fails.

Description: This test case verifies that if a primary server fails, the application starts running on the secondary server and that after the primary server is restored, the application can be reestablished on the primary server.

Test Setup: A dual cluster configuration, with replication (Figure), with a single node in each cluster.

Procedure

Step 1	Verify that the APP and ClusterService service groups are running in the primary cluster by running the following command from the secondary server: Example: C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <SecondaryServer> Y N OFFLINE B APPrep <SecondaryServer> Y N ONLINE B ClusterService <SecondaryServer> Y N ONLINE -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_primary ALIVE -- REMOTE CLUSTER STATE -- Cluster State M csm_primary RUNNING -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_primary:<PrimaryServer> RUNNING 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_primary:<PrimaryServer> Y N ONLINE
Step 2	Remove the power from the primary server to cause a server failure. Verify that the secondary cluster reported a loss of connectivity to the primary cluster. Example: C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <SecondaryServer> Y N OFFLINE B APPrep <SecondaryServer> Y N ONLINE B ClusterService <SecondaryServer> Y N ONLINE -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_primary ALIVE -- REMOTE CLUSTER STATE -- Cluster State M csm_primary LOST_CONN -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_primary:<PrimaryServer> RUNNING 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_primary:<PrimaryServer> Y N ONLINE
Step 3	Confirm that the state of the replication is disconnected. You can see this state from the flags parameter in the output of the following command: Example: `C:\> vxprint -Pl Diskgroup = BasicGroup Diskgroup = datadg Rlink : rlk_172_32481 info : timeout=500 packet_size=1400 latency_high_mark=10000 latency_low_mark=9950 bandwidth_limit=none state : state=ACTIVE synchronous=off latencyprot=off srlprot=off assoc : rvg=CSM_RVG remote_host=172.25.84.33 remote_dg=datadg remote_rlink=rlk_172_6037 local_host=172.25.84.34 protocol : UDP/IP flags : write attached consistent disconnected`
Step 4	Start the application on the secondary server by using the following command. Example: `C:\> hagrp -online -force APP -sys secondary_server_name`
Step 5	Log in to the application and change some data so that you can verify later that changes made while the application operating on the secondary server can be retained when you revert to the primary server.
Step 6	Restore power to the primary server and allow the server to fully start up.
Step 7	Verify the status of the replication to show that the replication is connected; however, the two sides are not synchronized. Example: `C:\> vxprint -Pl Diskgroup = BasicGroup Diskgroup = datadg Rlink : rlk_172_32481 info : timeout=500 packet_size=1400 latency_high_mark=10000 latency_low_mark=9950 bandwidth_limit=none state : state=ACTIVE synchronous=off latencyprot=off srlprot=off assoc : rvg=CSM_RVG remote_host=172.25.84.33 remote_dg=datadg remote_rlink=rlk_172_6037 local_host=172.25.84.34 protocol : UDP/IP flags : write attached consistent connected dcm_logging failback_logging`
Step 8	Convert the original primary RVG to secondary and synchronize the data volumes in the original primary RVG with the data volumes on the new primary RVG using the fast failback feature. Using the Cluster Explorer for the secondary cluster, right-click the RVGPrimary resource (APP_RVGPrimary), select actions, then select fbsync from the Actions dialog box, and then click OK. Alternatively you can issue the following command: Example: `C:\> hares -action APP_RVGPrimary fbsync 0 -sys secondary_server_name`
Step 9	Verify that the current secondary (former primary) is synchronized with the current primary (former secondary) by looking for the keyword consistent in the flags parameter of the output of the following command: Example: `C:\> vxprint -Pl Diskgroup = BasicGroup Diskgroup = datadg Rlink : rlk_172_32481 info : timeout=29 packet_size=1400 latency_high_mark=10000 latency_low_mark=9950 bandwidth_limit=none state : state=ACTIVE synchronous=off latencyprot=off srlprot=off assoc : rvg=CSM_RVG remote_host=172.25.84.33 remote_dg=datadg remote_rlink=rlk_172_6037 local_host=172.25.84.34 protocol : UDP/IP flags : write attached consistent connected`
Step 10	Using the VCS Cluster Explorer on the secondary cluster, select the APP service group. From the shortcut menu, select Switch To, then Remote Switch(...) to open the Switch global dialog box. In the dialog box specify the primary cluster and the primary server. Alternately issue the following command, where primarycluster is the name of the primary cluster: Example: `C:\> hagrp -switch APP -any -clus primarycluster`
Step 11	Log in to the application to verify that the changes you made on the secondary server were retained.

Application Failures

This section covers test cases where the Security Manager application fails. Two cases are covered: a single cluster configuration and a dual cluster configuration. This section contains the following topics:

Application Failure, Single Cluster
Application Failure, Dual Cluster

Application Failure, Single Cluster

Test Case Title: The application fails on the primary server in a single cluster configuration.

Description: This test case verifies that VCS detects an application failure and that VCS automatically moves the application to the secondary server.

Test Setup: A dual node cluster (Figure) using the default application failover behavior.

Procedure

Step 1	Verify that the APP service group is running on the primary server in the cluster by running the following command: Example: `C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N ONLINE B APP <SecondaryServer> Y N OFFLINE`
Step 2	On the server where Security Manager is running, stop the application by issuing the following command: Example: `C:\> net stop crmdmgtd`
Step 3	Verify that VCS detects that Security Manager failed on the primary server and starts the application on the secondary server. Example: `# hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N OFFLINE\|FAULTED B APP <SecondaryServer> Y N ONLINE -- RESOURCES FAILED -- Group Type Resource System C APP CSManager APP_CSManager <PrimaryServer>`
Step 4	Manually clear the fault on the APP service group. Example: `C:\> hagrp -clear APP -sys primary_server_name`
Step 5	Manually switch the APP service group back to the primary server. Example: `C:\> hagrp -switch APP -to primary_server_name`

Application Failure, Dual Cluster

Test Case Title: The application fails on the primary server in a dual cluster configuration.

Description: This test case verifies that VCS detects an application failure.

Test Setup: A dual cluster configuration, with replication (Figure), with a single node in each cluster. Likewise, the assumption is that the default application failover behavior has not been modified (that is, failover between clusters requires manual intervention).

Procedure

Step 1	Verify that the APP and ClusterService service groups are running in the primary cluster by running the following command from the primary server: Example: C:\> hastatus -sum -- SYSTEM STATE -- System State Frozen A <SecondaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <SecondaryServer> Y N OFFLINE B APPrep <SecondaryServer> Y N ONLINE B ClusterService <SecondaryServer> Y N ONLINE -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_primary ALIVE -- REMOTE CLUSTER STATE -- Cluster State M csm_primary RUNNING -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_primary:<PrimaryServer> RUNNING 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_primary:<PrimaryServer> Y N ONLINE
Step 2	On the server where Security Manager is running, stop the application by issuing the following command: Example: `C:\> net stop crmdmgtd`
Step 3	Verify that VCS detects that the application failed and stops the APP service group. Issue the following command and observe the output. Example: # hastatus -sum -- SYSTEM STATE -- System State Frozen A <PrimaryServer> RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B APP <PrimaryServer> Y N OFFLINE\|FAULTED B APPrep <PrimaryServer> Y N ONLINE B ClusterService <PrimaryServer> Y N ONLINE -- RESOURCES FAILED -- Group Type Resource System C APP CSManager APP_CSManager <PrimaryServer> -- WAN HEARTBEAT STATE -- Heartbeat To State L Icmp csm_secondary ALIVE -- REMOTE CLUSTER STATE -- Cluster State M csm_secondary RUNNING -- REMOTE SYSTEM STATE -- cluster:system State Frozen N csm_secondary:<SecondaryServer> RUNNING 0 -- REMOTE GROUP STATE -- Group cluster:system Probed AutoDisabled State O APP csm_secondary:<SecondaryServer> Y N OFFLINE
Step 4	Manually clear the fault on the APP service group. Example: `C:\> hagrp -clear APP`
Step 5	Put the APP service group online on the primary server to restart the application. Example: `C:\> hagrp -online APP -sys primary_server_name`

High Availability Installation Guide for Cisco Security Manager 4.29

Bias-Free Language

Results

Chapter: High Availability and Disaster Recovery Certification Test Plan

High Availability and Disaster Recovery Certification Test Plan

Manual Switches

IntraCluster Switch

Procedure

Example:

Example:

InterCluster Switch

Procedure

Example:

Example:

Example:

Ethernet/Network Failures

Network Communication Failures

Network Ethernet Failure on Secondary Server, Single Cluster

Procedure

Example:

Example:

Network Ethernet Failure on Primary Server, Single Cluster

Procedure

Example:

Example:

Example:

Network Ethernet Failure on Secondary Server, Dual Cluster

Procedure

Example:

Example:

Example:

Example:

Example:

Network Ethernet Failure on Primary Server, Dual Cluster

Procedure

Example:

Example:

Example:

Example:

Example:

Cluster Communication Failure

Procedure

Example:

Example:

Example:

Example:

Example:

Server Failures

Standby Server Failure, Single Cluster

Procedure

Example:

Example:

Example:

Primary Server Failure, Single Cluster

Procedure

Example:

Example:

Example:

Example:

Standby Server Failure, Dual Cluster

Procedure

Example:

Example:

Example:

Example:

Primary Server Failure, Dual Cluster

Procedure

Example:

Example:

Example:

Example:

Example:

Example:

Example:

Example:

Application Failures

Application Failure, Single Cluster

Procedure

Example:

Example: