ESC has the following VM recovery types that you can specify when you deploy a VNF:
-
Auto Recovery
-
Manual Recovery
ESC supports recovery using the policy-driven framework, see Configuring a Recovery Policy Using the Policy-driven Framework for details.
There are three types of actions for a VM recovery that can be specified in the deployment data model:
-
REBOOT_THEN_REDEPLOY (default)—When a VM down event is received or the timer expires, the healing workflow first attempts to reboot the VM, if it fails
to reboot, then it attempts to redeploy the VM on the same host.
-
REBOOT_ONLY—When a VM down event is received or the timer expires, the healing workflow only attempts to reboot the VM.
-
REDEPLOY_ONLY—When a VM down event is received or the timer expires, the healing workflow only attempts to redeploy the VM.
Note |
If the policy involves REBOOT_THEN_REDEPLOY and REDEPLOY_ONLY for redeploying the VMs, and if the placement policy is not
enforced, then the VIM decides which host to redeploy the VM on.
|
Note |
ESC supports both manual and auto-recovery for vCloud Director. All three types of recovery actions are applicable for the
vCloud Director. The REBOOT_THEN_REDEPLOY is the default recovery action. For vCD deployment, see Deploying Virtual Network Functions on VMware vCloud Director (vCD).
Any recovery action that involves redeploying the VM will automatically recreate and attach ephemeral ports and volumes managed
by ESC, that are faulty or deleted to ensure the recovery is successful.
|
Auto
Recovery
In Auto recovery,
the recovery type parameter is set to Auto. ESC automatically recovers the VM with the
specified <action-on-recovery> value in the recovery policy. The recovery type
is auto by default if the user does not choose a recovery type.
<recovery_policy>
<recovery_type>AUTO</recovery_type>
<action_on_recovery>REBOOT_THEN_REDEPLOY</action_on_recovery>
<max_retries>3</max_retries>
</recovery_policy>
Manual Recovery
Manual Recovery of a VM
In manual recovery, ESC sends the VM_MANUAL_RECOVERY_NEEDED notification to northbound (NB) and waits for the instruction
from NB for recovery. ESC performs recovery when it receives recovery instruction from NB. For manual recovery of a complete
deployment, see Manual Recovery of a Deployment
ESC also supports overriding of the actions on a single request basis, using the action-on-recovery parameter in the recovery
policy. In addition to the 3 recovery actions listed before, there are 2 more recovery actions available:
-
RESET_STATE_THEN_REBOOT – before rebooting the VM, the VM state is reset to allow the VIM to reboot the VM for recovery. This is only applicable
to Openstack.
-
DISASTER_RECOVERY – when the VIM to which the VNF has been deployed, become unavailable and there is a need to move the VNF to a new VIM for
service continuity, this action can be invoked to redeploy the VNF (entire service, not individual VMs) on to a new VIM.
To use this action, it must be preceded by a model-only service update to update the VIM locator; failure to carry out this
step will result in the recovery request failing. See below for details on how to perform this type of service update (via
the REST API only).
The original VNF is not attempted to be removed. Since it is assumed that use of this recovery action implies that the VNF
is unreachable from the orchestration stack and when the VIM itself has been recovered, the old deployment must be manually cleaned.
The manual recovery policy data model is as follows
<vm_group>
...
<recovery_policy>
<recovery_type>MANUAL</recovery_type>
<action_on_recovery>REBOOT_THEN_REDEPLOY</action_on_recovery>
<max_retries>3</max_retries>
</recovery_policy>
...
</vm_group
For more information about recovery policy parameters in the data model, see Elastic Services Controller Deployment Attributes. For more information about configuring the recovery policy in the ESC Portal (VMware only), see the Deploying VNFs on VMware
vCenter using ESC Portal.
The VM_MANUAL_RECOVERY_NEEDED notification is as follows:
===== SEND NOTIFICATION STARTS =====
WARN Type: VM_MANUAL_RECOVERY_NEEDED
WARN Status: SUCCESS
WARN Status Code: 200
WARN Status Msg: Recovery event for VM [manual-recover_error-g1_0_7d96ad0b-4f27-4a5a-bdf7-ec830e93d07e] triggered.
WARN Tenant: manual-recovery-tenant
WARN Service ID: NULL
WARN Deployment ID: 08491863-846a-4294-b305-c0002b9e8daf
WARN Deployment name: dep-error
WARN VM group name: error-g1
WARN VM Source:
WARN VM ID: ffea079d-0ea2-4d47-ba31-26a08e6dff22
WARN Host ID: 3a5351dc4bb7df0ee25e238a8ebbd6c6fcdf225aebcb9dff6ba10249
WARN Host Name: my-server-27
WARN [DEBUG-ONLY] VM IP: 192.168.0.3;
WARN ===== SEND NOTIFICATION ENDS =====
APIs for Manual Recovery of a VM
You can perform the manual recovery using the Confd and Rest APIs. The manual recovery request can be configured to override
the predefined recovery action to any desired action.
Netconf API recovery-vm-action DO generated vm name [xmlfile]
To perform recovery using the API, login to esc_nc_cli and run the following command:
$ esc_nc_cli --user <username> --password <password> recovery-vm-action DO [xmlfile]
The recovery is performed and the recovery notification is sent to NB.
Note |
Recovery (recovery-vm-action DO <VM-NAME>) can be performed after the VM is alive and the service is active. If the deployment
is incomplete, it must be completed before performing recovery.
If a failover happens during a configurable manual recovery, the manual recovery resumes with predefined recover action.
The migration of any deployment must always use default recovery policy. You must not provide recovery action for VM/VNF manual
recovery in an LCS based recovery.You must not use enable monitor and configurable manual recovery options together.
|
REST API
http://ip:8080/ESCAPI/#!/Recovery_VM_Operations/handleOperation
POST /v0/{internal_tenant_id}/deployments/recovery-vm/{vm_name}
Recovery VM operation payload:
{
"operation":"recovery_do",
"properties":{
"property":[
{
"name":"action",
"value":"REDEPLOY_ONLY"
}
]
}
}
In order to perform a model-only service update, a new parameter can be supplied to the edit-config API to prevent any action
to be taken on the VIM and limit the update to the ESC data model only. This allows the preparation of the data model to complete
until such time that the deployment is ready to be updated on the VIM:
http://ip:8080/ESCManager/v0/conf/edit-config?modelOnly=true
The VIM locator update, for example, prior to invoking the recovery API with DISASTER_RECOVERY
as the <action-on-recovery>:
<?xml version="1.0" encoding="UTF-8"?>
<esc_datamodel xmlns="http://www.cisco.com/esc/esc" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:ns1="urn:ietf:params:xml:ns:netconf:notification:1.0">
<tenants>
<tenant>
<name>admin-tenant</name>
<deployments>
<deployment>
<name>test-deploy</name>
<networks>
<network>
<name>test-network</name>
<locator>
<vim_id>my-ucs-59</vim_id>
<vim_project>admin</vim_project>
</locator>
</network>
</networks>
<vm_group>
<name>g1</name>
<locator>
<vim_id>my-ucs-59</vim_id>
<vim_project>admin</vim_project>
</locator>
<bootup_time>120</bootup_time>
</vm_group>
</deployment>
</deployments>
</tenant>
</tenants>
</esc_datamodel>
Note |
Do not forget that the old deployment has to be deleted in the disaster recovery scenario, once the VIM is available again.
|
A further use for this API is to update the persistent volume UUID prior to rebooting the VM via the recovery API documented
above. This has the benefit of negating the need to remove the VM group and re-add it, as per earlier versions of ESC. Here
is an example payload:
<esc_datamodel xmlns="http://www.cisco.com/esc/esc" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:ns1="urn:ietf:params:xml:ns:netconf:notification:1.0">
<tenants>
<tenant>
<name>my-tenant</name>
<deployments>
<deployment>
<name>my-dep</name>
<vm_group>
<name>my-vm</name>
<bootup_time>1800</bootup_time>
<volumes>
<volume>
<name>new-volume</name>
<volid>1</volid>
<bus>ide</bus>
<type>lvm</type>
</volume>
<volume nc:operation="delete">
<name>old-volume</name>
<volid>1</volid>
</volume>
</volumes>
</vm_group>
</deployment>
</deployments>
</tenant>
</tenants>
</esc_datamodel>
Supported VM States and Service Combinations for Manual Recovery of a VM
The API, recovery-vm-action, applies to both auto and manual recovery types, but only under certain VM states and services.
The following table shows the details. In general, during deployment, service update, undeployment, and recovery, the manual
recovery action is rejected by ESC.
VM State
|
Service State
|
recovery-vm-action
|
ALIVE
|
ACTIVE
|
supported
|
ALIVE
|
ERROR
|
supported
|
ERROR
|
ERROR
|
supported
|
Manual Recovery of a Deployment
Recovery Without Monitoring Parameters ESC supports manual recovery of VMs at the service level, that is, recovery of a complete deployment. After the successful
deployment of a service, the service may move into an error state because of failed VMs. ESC can manually recover these failed
VMs, or the complete deployment through a deployment recovery request. For manual recovery of a single VM, see Manual Recovery.
APIs for Manual Recovery of a Deployment
You can perform the manual recovery using the NETCONF and REST APIs.
The manual recovery request can be configured to override the predefined recovery action to any desired action.
Note |
There is no service active notification after the deployment recovery. You must run a query, for example, esc_nc_cli --user <username> --password <password> get esc_datamodel to see if the service state of the deployment is active or not.
If a failover happens during a configurable manual recovery, the manual recovery resumes with predefined recover action.
The migration of any deployment must always use a default recovery policy. You must not provide recovery action for VM/VNF
manual recovery in an LCS-based recovery.You must not use enable monitor and configurable manual recovery options together.
|
NETCONF API
svc-action RECOVER tenant-name deployment-name [xmlfile]
To perform recovery using the API, login to esc_nc_cli
.
REST API
POST /v0/{internal_tenant_id}/deployments/service/{internal_deployment_id}
Content-Type: application/xml
Accept: application/json
Callback: http://172.16.0.1:9010/
Callback-ESC-Events: http://172.16.0.1:9010/
<service_operation xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<operation>recover</operation>
</service_operation>
where,
internal_tenant_id—is the system admin tenant ID or the tenant name.
internal_deployment_id—is the deployment name.
Supported VM States and Service Combinations for Manual Recovery of a Deployment
The API, svc-action RECOVER, applies to both auto and manual recovery types, but only under certain VM states and services.
The following table shows the details. In general, during deployment, service update, undeployment and recovery, the manual
recovery action is rejected by ESC.
Note |
ESC accepts VM level recovery request when the service is in active or error state.
Notifications are not sent to NB if all VMs are in the ALIVE state after a service recovery request.
|
VM State
|
Service State
|
svc-action RECOVER
|
ERROR
|
ERROR
|
supported
|
ERROR
|
ERROR
|
supported
|
Recovery Enabled with Monitoring ParametersDuring manual recovery, you can recover a VM depending on its monitoring parameters. If the VM is in error state, set the
monitoring parameters to bring back the VM in error state to live state. If the VM is recovered, then ESC sends a RECOVERY_CANCELLED
notification. If the VM does not come back live, then the recovery process is triggered. See Manual Recovery for more details.
NETCONF APIsvc-action SET_MONITOR_AND_RECOVER <tenant-name> <dep-name>
Recovery notification:
===== SEND NOTIFICATION STARTS =====
WARN Type: VM_RECOVERY_INIT
WARN Status: SUCCESS
WARN Status Code: 200
WARN Status Msg: Recovery with enabling monitor first event for VM Generated ID [dep-resource_g1_0_74132737-d0a4-4ef0-bd9e-86465c1017bf] triggered.
Note |
Recovery enabled with monitoring parameters is for manual recovery at service level only.
|
The monitor_on_error parameter enables continuous monitoring of the VMs in error state.
<recovery_policy>
<recovery_type>AUTO</recovery_type>
<action_on_recovery>REBOOT_ONLY</action_on_recovery>
<max_retries>1</max_retries>
<monitor_on_error>true</monitor_on_error>
</recovery_policy>
The default value is false.
if false, monitoring is unset on the vm in error state.
If true, monitoring is set on the vm in error state. If any VM Alive event occurs later ( after VM_RECOVERY_COMPLETE), the VM is moved back to alive
state.