Healing Virtual Network Functions Using ETSI API
As part of life cycle management, ESC heals the VNFs when there is a failure. The recovery policy specified during deployment controls the recovery. ESC supports recovery using the policy-driven framework, for more information, see Configuring a Recovery Policy Using the Policy-driven Framework in the Cisco Elastic Services Controller User Guide.
The healing parameters define the behavior that is monitored to trigger a notification to heal a VNF. These parameters are configured in the KPI section of each compute node in the VNFD with rules. The rules define the action as a result of these KPI conditions to heal a VNF.
The ETSI VNFM configures monitoring using the following two sections:
-
kpi_data—defines the type of monitoring, events, polling interval, and other parameters
-
admin_rules—defines the actions when the KPI monitoring events are triggered
Example:
vdu1:
type: cisco.nodes.nfv.Vdu.Compute
properties:
name: Example VDU1
description: Example VDU
...
configurable_properties:
additional_vnfc_configurable_properties:
vim_flavor: { get_input: VIM_FLAVOR }
bootup_time: { get_input: BOOTUP_TIME }
vm_name_override: { get_input: VDU1_VM_NAME}
recovery_action: REBOOT_THEN_REDEPLOY
recovery_wait_time: 1
kpi_data:
VM_ALIVE-1:
event_name: 'VM_ALIVE'
metric_value: 1
metric_cond: 'GT'
metric_type: 'UINT32'
metric_occurrences_true: 1
metric_occurrences_false: 30
metric_collector:
type: 'ICMPPing'
nicid: 1
address_id: 0
poll_frequency: 10
polling_unit: 'seconds'
continuous_alarm: false
admin_rules:
VM_ALIVE:
event_name: 'VM_ALIVE'
action:
- 'ALWAYS log'
- 'FALSE recover autohealing'
- 'TRUE esc_vm_alive_notification'
The previous example shows the default KPI and rule to support the service alive notification required to complete the deployment in ESC. For more information on KPI, rules, and the underlying data model that is exposed in the VNFD, see KPIs, Rules and, Metrics in the Cisco Elastic Services Controller User Guide.
The recovery of the VNF is to request action against the affected VNFCs determined by the recovery policy defined during the initial deployment or in the recovery request.
There are four types of actions for recovery. When an event denoting that an instance requires attention is received, a timer expires, or a manual recovery request is received. The healing workflow by default uses the recovery policy configured at either the VNF-level or at the VNFC-level within the VNFD. The supported policies are:
-
REBOOT_THEN_REDEPLOY—first attempt to reboot the affected VNFCs; if this fails, then it attempts to redeploy the affected VNFCs (on the same host)
-
REBOOT_ONLY—only attempt to reboot the VM
-
RESET_THEN_REBOOT—reset the state of the VM (Openstack only) and then attempt to reboot the VM
-
REDEPLOY_ONLY—only attempt to redeploy the VM
If the recovery policy is configured at a VNF-level, the policy applies to each constituent VNFC. If it is specified at VNFC-level, then that policy prevails. The monitoring agent monitors each VNFC and when a recovery situation arises, the message is converted to an alarm and sent to any subscribed consumers (e.g. an NFVO or Element Manager).
The HealVnfRequest contains a cause parameter that triggers different behaviors within the VNFM while processing the recovery request. If the cause is one of the values supported by the VNFM (and listed in the VNFD for the deployment as a supported cause) then certain additionalParams keys are activated to support the desired recovery action, as mentioned in the following table. If the NFVO supports the cause, the grant receives the additionalParams and allows the inputs to be modified before executing the recovery request.
If the cause is not one of the overriding causes supported by ESC, then it is assumed that the value provided is simply metadata and ignored; the VNFM would then use the recovery policy configured at the time of deployment. If the cause is supported by ESC, but not listed in the VNFD, then the request is rejected.
Cause |
additionalParams keys |
Recovery behavior |
||
---|---|---|---|---|
APPLICATION_FAILURE |
Optional: vnfcInstanceId |
The recovery attempts to reboot the entire VNF unless
|
||
VIRTUALISATION_FAILURE |
Optional:
|
The treatment of the In addition, if there is a persistent volume to be replaced in the same request, the identifier for the volume in the VNFD and the VIM is supplied to avoid multiple requests. However, the VNFC to which the volume is attached must be in the list of VNFCs to be healed. This persistent volume update is only applicable to Openstack VIMs. Any ephemeral ports and volumes managed by VNFM that are faulty or deleted will be recreated and attached to ensure the recovery is successful. |
||
APPLICATION_OR_VIRTUALISATION_FAILURE |
Optional: vnfcInstanceId |
As per |
||
INVALID_VM_STATE |
Optional: vnfcInstanceId |
As per |
||
PERSISTENT_VOLUME_FAILURE |
Mandatory:
Optional: vnfcInstanceId |
The treatment of the |
||
CHANGE_PERSISTENT_VOLUME |
Mandatory:
|
The mandatory keys allow a new persistent (including multi-attach) volume to replace the existing volume without redeploying the VM. Once the data model is updated and the volume is replaced, the VM is rebooted. This is only applicable to Openstack VIMs. |
||
VIM_FAILURE |
None |
No
|
If autoheal is enabled on the VNF instance, then ESC automatically attempts to recover the VNF based on the recovery policy configured on deployment. This may be configured in the VNFD or modified against the VNF instance before instantiation.
To modify the autoheal flag (isAutohealEnabled) VNF instance resource, see Modifying Virtual Network Functions.
If autoheal is not enabled, only the alarm is dispatched to all the subscribers. The subscriber can initiate a manual HealVnfRequest, as per the following examples. The parameters are optional by default but subject to the rules in table 9 for the different causes.
Example for SOL003:
Method type:
POST
VNFM Endpoint:
/vnf_instances/{vnfInstanceId}/heal
HTTP Request Header:
Content-Type:application/json
Request Payload (ETSI data structure: HealVnfRequest)
{
"cause":"VIRTUALISATION_FAILURE",
"additionalParams": {
"virtualStorageDescId": "cf-cdr1-vol",
"resourceId": " d8771acb-a32f-66dg-7bc2-8f4ec333ccb8"
},
"vnfcInstanceId": [b9909dde-e21e-45ec-9cc0-9e9ae413eee0"]
}
Example for SOL002:
POST /vnf_instance/{vnfInstanceId}/heal
{
"vnfcInstanceId": ["b9909dde-e21e-45ec-9cc0-9e9ae413eee0"],
"cause": "b9909dde-e21e-45ec-9cc0-9e9ae413eee0"
}
The list of vnfcInstanceIds
constrains recovery to the required VNFCs. However, the absence of this list means the request applies to the entire VNF.
The cause in the SOL002 HealVnfRequest has the same behavior as in the SOL003 API.
For information on monitoring, see Monitoring Virtual Network Functions Using ETSI API.