The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes the high-level information on how to recover CNAT VMs, CUPS VMs, and 5G-UPF VMs.
Cisco recommends that you have knowledge of these topics:
The information in this document is based on these software and hardware versions:
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
The UAME is a new Ultra Automation Services (UAS) software module introduced to:
The UAME provides deployment orchestration for the following:
ESC is the VNFM mentioned in this article which is the only supported platform currently.
VMs hosting cloud-native 5G SMI VMs are in ERROR state in ESC.
crucs502-cnat-cn_oam1_0_d7f90c1e-4401-4be9-87f6-f39ecf04ea3a VM_ERROR_STATE
crucs502-cnat-cn_master_0_05487525-c86f-47e1-a07e-fd33720d114f VM_ERROR_STATE
crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7 VM_ERROR_STATE
Check for VM status in UAME and ESC. start the recovery process from ESC. If ESC is unable to recover the VM, proceed with re-deploy from UAME.
Log in to the UAME, navigate to the confd cli, and check the state as shown here.
ubuntu@crucs502-uame-1:~$ /opt/cisco/usp/uas/confd-6.3.8/bin/confd_cli -u admin -C
Enter Password for 'admin':
elcome to the ConfD CLI
admin connected from 10.249.80.137 using ssh on crucs502-uame-1
crucs502-uame-1#
crucs502-uame-1#show vnfr state
VNFR ID STATE
---------------------------------
crucs502-4g-CRPCF504 alive
crucs502-4g-CRPCF505 alive
crucs502-4g-CRPCF506 alive
crucs502-4g-CRPCF507 error
crucs502-4g-CRPCF604 alive
crucs502-cnat-cnat error
Try to recover from ESC manually.
Note: Recovery might take up to 900s (15 min) to complete.
bootup_time 300
recovery_wait_time 600
Log in to master ESC, check the health, and then execute the recovery commands as shown here.
Last login: Wed May 13 02:07:42 2020 from 10.x.x.x
####################################################################
# ESC on crucs502-esc-vnf-esc-core-esc-1 is in MASTER state.
####################################################################
[admin@crucs502-esc-vnf-esc-core-esc-1 ~]$ health.sh
============== ESC HA (MASTER) with DRBD =================
vimmanager (pgid 14643) is running
monitor (pgid 14712) is running
mona (pgid 14768) is running
drbd (pgid 0) is master
snmp is disabled at startup
etsi is disabled at startup
pgsql (pgid 15119) is running
keepalived (pgid 14070) is running
portal is disabled at startup
confd (pgid 15016) is running
filesystem (pgid 0) is running
escmanager (pgid 15254) is running
=======================================
ESC HEALTH PASSED
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO crucs502-cnat-cn_oam1_0_d7f90c1e-4401-4be9-87f6-f39ecf04ea3a
tail -50f /var/log/esc/yangesc.log
2020-05-05 02:29:01.534 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 02:29:01.534 WARN Type: VM_RECOVERY_COMPLETE
2020-05-05 02:29:01.534 WARN Status: SUCCESS
2020-05-05 02:29:01.534 WARN Status Code: 200
2020-05-05 02:29:01.534 WARN Status Msg: Recovery: Successfully recovered VM [crucs502-cnat-cn_oam1_0_d7f90c1e-4401-4be9-87f6-f39ecf04ea3a].
2020-05-05 02:29:01.534 WARN Tenant: core
2020-05-05 02:29:01.534 WARN Deployment name: crucs502-cnat-cnat-core
2020-05-05 02:29:01.534 WARN VM group name: oam1
<output trimmed>
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO crucs502-cnat-cn_master_0_05487525-c86f-47e1-a07e-fd33720d114f
tail -50f /var/log/esc/yangesc.log
2020-05-05 02:12:51.512 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 02:12:51.512 WARN Type: VM_RECOVERY_COMPLETE
2020-05-05 02:12:51.512 WARN Status: SUCCESS
2020-05-05 02:12:51.512 WARN Status Code: 200
2020-05-05 02:12:51.512 WARN Status Msg: Recovery: Successfully recovered VM [crucs502-cnat-cn_master_0_05487525-c86f-47e1-a07e-fd33720d114f].
2020-05-05 02:12:51.512 WARN Tenant: core
2020-05-05 02:12:51.512 WARN Deployment name: crucs502-cnat-cnat-core
<output trimmed>
Check the yangesc logs (tail -50f /var/log/esc/yangesc.log) and look for Status and Recovery as shown above. If it is successful, navigate to the confd cli and verify.
[admin@crucs502-esc-vnf-esc-core-esc-1 ~]$ /opt/cisco/esc/confd/bin/confd_cli -u admin -C
admin connected from 10.249.80.137 using ssh on crucs502-esc-vnf-esc-core-esc-1
crucs502-esc-vnf-esc-core-esc-1# show esc_datamodel opdata tenants tenant | select deployments state_machine
NAME DEPLOYMENT NAME STATE VM NAME STATE
-------------------------------------------------------------------------------------------------------------------------------------------
<trucated output>
crucs502-cnat-cn_etcd2_0_7263c87c-ee62-4b81-8e1e-a0f5c463a5b5 VM_ALIVE_STATE
crucs502-cnat-cn_etcd3_0_512ef3c0-96a2-4a10-83b0-4c7d13805856 VM_ALIVE_STATE
crucs502-cnat-cn_master_0_05487525-c86f-47e1-a07e-fd33720d114f VM_ALIVE_STATE
crucs502-cnat-cn_master_0_8cf66daa-9dfe-4c7e-817e-36624f9c98c2 VM_ALIVE_STATE
crucs502-cnat-cn_master_0_dff4ad36-7982-4131-a737-ccb6c8eae348 VM_ALIVE_STATE
crucs502-cnat-cn_oam1_0_d7f90c1e-4401-4be9-87f6-f39ecf04ea3a VM_ALIVE_STATE
When ESC shows VM_ALIVE_STATE, verify the status in UAME
crucs502-uame-1#show vnfr state
VNFR ID STATE
---------------------------------
crucs502-4g-CRPCF504 alive
crucs502-4g-CRPCF505 alive
crucs502-4g-CRPCF506 alive
crucs502-4g-CRPCF507 alive
crucs502-4g-CRPCF604 alive
crucs502-4g-CRPCF605 alive
crucs502-4g-CRPCF606 alive
crucs502-4g-CRPCF607 alive
crucs502-4g-CRPGW502 alive
crucs502-4g-CRPGW503 alive
crucs502-4g-CRPGW608 alive
crucs502-4g-CRPGW609 alive
crucs502-4g-CRPGW610 alive
crucs502-4g-CRPGW611 alive
crucs502-4g-CRPGW612 alive
crucs502-4g-CRPGW613 alive
crucs502-4g-CRPGW614 alive
crucs502-4g-CRPGW615 alive
crucs502-4g-CRSGW606 alive
crucs502-4g-CRSGW607 alive
crucs502-4g-CRSGW608 alive
crucs502-4g-CRSGW609 alive
crucs502-4g-CRSGW610 alive
crucs502-4g-CRSGW611 alive
crucs502-5g-upf-CRUPF014 alive
crucs502-5g-upf-CRUPF015 alive
crucs502-5g-upf-CRUPF016 alive
crucs502-5g-upf-CRUPF017 alive
crucs502-5g-upf-CRUPF018 alive
crucs502-5g-upf-CRUPF019 alive
crucs502-5g-upf-CRUPF020 alive
crucs502-5g-upf-CRUPF021 alive
crucs502-5g-upf-CRUPF022 alive
crucs502-5g-upf-CRUPF023 alive
crucs502-5g-upf-CRUPF024 alive
crucs502-5g-upf-CRUPF025 alive
crucs502-5g-upf-CRUPF026 alive
crucs502-5g-upf-CRUPF027 alive
crucs502-cnat-cnat alive
crucs502-cnat-smi-cm alive
crucs502-esc-vnf-esc alive
verify the same in openstack (source the correct overcloud rc file)
(crucs502) [stack@crucs502-ospd ~]$ nova list --fields name,status,host |egrep "CRPCF507|cnat"
<truncated output>
| 3eb43fe7-9f41-42d8-afe4-80f6fd62c385 | crucs502-4g-CRPCF507-core-CRPCF5071 | ACTIVE | crucs502-compute-11.localdomain |
| cc678283-2967-4404-a714-e4dd78000e82 | crucs502-cnat-cnat-core-etcd1 | ACTIVE | crucs502-osd-compute-0.localdomain |
| 711d6fcd-b816-49d4-a702-e993765757b0 | crucs502-cnat-cnat-core-master3 | ACTIVE | crucs502-osd-compute-3.localdomain |
| 46f64bde-a8db-48f2-bf3d-fe3b01295f2f | crucs502-cnat-cnat-core-oam1 | ACTIVE | crucs502-osd-compute-3.localdomain |
| f470ba3d-813e-434b-aac8-78bc646fda22 | crucs502-cnat-cnat-core-oam2 | ACTIVE | crucs502-osd-compute-2.localdomain |
This example shows a case of recovery failure from ESC. In this case, VM is re-deployed from UAME.
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7
This output shows the failure message in yangesc.log
tail -50f /var/log/esc/yangesc.log
2020-05-05 02:57:21.143 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 02:57:21.143 WARN Type: VM_RECOVERY_INIT
2020-05-05 02:57:21.143 WARN Status: SUCCESS
2020-05-05 02:57:21.143 WARN Status Code: 200
2020-05-05 02:57:21.143 WARN Status Msg: Recovery event for VM Generated ID [crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7] triggered.
2020-05-05 02:57:21.143 WARN Tenant: core
2020-05-05 02:57:21.143 WARN Deployment name: crucs502-4g-CRPCF507-core
2020-05-05 02:57:21.143 WARN VM group name: CRPCF5071
<output trimmed>
2020-05-05 02:57:21.144 WARN ===== SEND NOTIFICATION ENDS =====
2020-05-05 03:09:21.655 WARN
2020-05-05 03:09:21.655 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 03:09:21.655 WARN Type: VM_RECOVERY_REBOOT
2020-05-05 03:09:21.655 WARN Status: SUCCESS
2020-05-05 03:09:21.655 WARN Status Code: 200
2020-05-05 03:09:21.655 WARN Status Msg: VM Generated ID [crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7] is rebooted.
2020-05-05 03:09:21.655 WARN Tenant: core
2020-05-05 03:09:21.655 WARN Deployment name: crucs502-4g-CRPCF507-core
2020-05-05 03:09:21.655 WARN VM group name: CRPCF5071
<output trimmed>
2020-05-05 03:09:21.656 WARN ===== SEND NOTIFICATION ENDS =====
2020-05-05 03:14:22.079 WARN
2020-05-05 03:14:22.079 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 03:14:22.079 WARN Type: VM_RECOVERY_COMPLETE
2020-05-05 03:14:22.079 WARN Status: FAILURE
2020-05-05 03:14:22.079 WARN Status Code: 500
2020-05-05 03:14:22.079 WARN Status Msg: Recovery: Recovery completed with errors for VM: [crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7]
2020-05-05 03:14:22.079 WARN Tenant: core
2020-05-05 03:14:22.079 WARN Deployment name: crucs502-4g-CRPCF507-core
2020-05-05 03:14:22.079 WARN VM group name: CRPCF5071
<output trimmed>
In ESC, the recovery method is reboot only. This shows that the VM could not be brought back with a reboot, Need to redeploy.
crucs502-esc-vnf-esc-core-esc-1# show running-config | include recovery_policy
recovery_policy recovery_type AUTO
recovery_policy action_on_recovery REBOOT_ONLY
recovery_policy max_retries 1
Re-confirm the VM status in UAME
ubuntu@crucs502-uame-1:~$ /opt/cisco/usp/uas/confd-6.3.8/bin/confd_cli -u admin -C
Enter Password for 'admin':
elcome to the ConfD CLI
admin connected from 10.249.80.137 using ssh on crucs502-uame-1
crucs502-uame-1#
crucs502-uame-1#
crucs502-uame-1#show vnfr state
VNFR ID STATE
---------------------------------
crucs502-4g-CRPCF504 alive
crucs502-4g-CRPCF505 alive
crucs502-4g-CRPCF506 alive
crucs502-4g-CRPCF507 error
crucs502-4g-CRPCF604 alive
crucs502-uame-1# recover nsd-id crucs502-4g vnfd CRPCF507 recovery-action redeploy
View UAME logs as well as ESC logs, This whole process may take up to 15 minutes.
UAME logs:
tail -50f /var/log/upstart /uame/log
<truncated output>
2020-05-06 08:57:22,252 - | VM_RECOVERY_DEPLOYED | CRPCF5071 | SUCCESS | Waiting for: VM_RECOVERY_COMPLETE|
2020-05-06 08:57:22,255 - Timing out in 143 seconds
2020-05-06 08:57:48,227 - | VM_RECOVERY_COMPLETE | crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7 | SUCCESS | (1/1)
2020-05-06 08:57:48,229 - NETCONF transaction completed successfully!
2020-05-06 08:57:48,231 - Released lock: esc_vnf_req
2020-05-06 08:57:48,347 - Deployment recover-vnf-deployment: crucs502-4g succeeded
2020-05-06 08:57:48,354 - Send Deployment notification for: crucs502-4g-CRPCF507
ESC Logs:
tail -50f /var/log/esc/yangesc.log
2020-05-06 08:58:01.454 WARN Type: VM_RECOVERY_COMPLETE
2020-05-06 08:58:01.454 WARN Status: SUCCESS
2020-05-06 08:58:01.454 WARN Status Code: 200
2020-05-06 08:58:01.454 WARN Status Msg: Recovery: Successfully recovered VM [crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7].
2020-05-06 08:58:01.454 WARN Tenant: core
2020-05-06 08:58:01.454 WARN Deployment ID: 4f958c43-dfa4-45d4-a69d-76289620c337
2020-05-06 08:58:01.454 WARN Deployment name: crucs502-4g-CRPCF507-core
2020-05-06 08:58:01.454 WARN VM group name: CRPCF5071
<output trimmed>
Verify the status of the VM. Follow the procedure in step3.