CNAT VMs, CUPS VMs and 5G-UPF VMs Recovery through UAME

Available Languages

Download Options

PDF (53.1 KB)
View with Adobe Reader on a variety of devices
ePub (86.3 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (76.7 KB)
View on Kindle device or Kindle app on multiple devices

Updated:December 9, 2020

Document ID:216509

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Background Information

Ultra Automation and Monitoring Engine (UAME)

Elastic Services Controller (ESC)

Problem

Procedure to Recover VMs

Step 1. Check the VM status from UAME.

Step 2. Recover the VMs from ESC.

Step 3. Check if the Recovery is Successful.

Successful Recovery

Failed Recovery

Check the Recovery Policy of the ESC

Step 4. Redeploy the VMs.

Redeploy VMs from UAME

Inspect Logs of Redeployment

Introduction

This document describes the high-level information on how to recover CNAT VMs, CUPS VMs, and 5G-UPF VMs.

Prerequisites

Requirements

Cisco recommends that you have knowledge of these topics:

Cisco Ultra Virtual Packet Core solutions components
Ultra Automation and Monitoring engine (UAME)
Elastic Service Controllers (ESC)
Openstack

Components used

The information in this document is based on these software and hardware versions:

USP 6.9.0
UAME
ESC: 4.5.0(112
StarOS : 21.15.28 (74825)
Cloud - Openstack 13 (Queens)

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Background Information

Ultra Automation and Monitoring Engine (UAME)

The UAME is a new Ultra Automation Services (UAS) software module introduced to:

Support the integrated deployment of 4G or 5G Virtualized Network Functions (VNFs) and 5G Cloud-Native Network Functions (CNFs).
Reduce the number of Virtual Machines (VMs) typically required by USP and UAS, replacing the UEM, AutoIT, AutoDeploy, and AutoVNF components.

The UAME provides deployment orchestration for the following:

4GVNFs :
- CUPS-based VNFs: The UAME works with the Virtual Network Function Manager (VNFM) to deploy VPC SI-based control plane (CP) and user plane (UP) VNFs in support of the Control and User Plane Separation (CUPS) architecture.
- Non-CUPS-based VNFs: The UAME is backward-compatible, working with the VNFM to provide deployment support for non-CUPS 4G gateways (based on VPC-DI) and the 4G Policy and Charging Rules Function (PCRF).
5G NFs:
- VNF-based NFs:The UAME works with the VNFM to deploy VPC-SI-based Network Functions (NFs).
- Cloud-native-based NFs:The UAME interacts with the VNFM to deploy the Ultra Cloud Core Subscriber Microservices Infrastructure (SMI). The SMI then works with the VNFM to deploy NFs in a VM-based Kubernetes (also known as K8s) cluster.

Elastic Services Controller (ESC)

ESC is the VNFM mentioned in this article which is the only supported platform currently.

Problem

VMs hosting cloud-native 5G SMI VMs are in ERROR state in ESC.

crucs502-cnat-cn_oam1_0_d7f90c1e-4401-4be9-87f6-f39ecf04ea3a   VM_ERROR_STATE 
crucs502-cnat-cn_master_0_05487525-c86f-47e1-a07e-fd33720d114f VM_ERROR_STATE 
crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7 VM_ERROR_STATE

Procedure to Recover VMs

Check for VM status in UAME and ESC. start the recovery process from ESC. If ESC is unable to recover the VM, proceed with re-deploy from UAME.

Step 1. Check the VM status from UAME.


ubuntu@crucs502-uame-1:~$ /opt/cisco/usp/uas/confd-6.3.8/bin/confd_cli -u admin -C
Enter Password for 'admin': 
elcome to the ConfD CLI
admin connected from 10.249.80.137 using ssh on crucs502-uame-1
crucs502-uame-1#

crucs502-uame-1#show vnfr state
VNFR ID STATE 
---------------------------------
crucs502-4g-CRPCF504 alive 
crucs502-4g-CRPCF505 alive 
crucs502-4g-CRPCF506 alive 
crucs502-4g-CRPCF507 error 
crucs502-4g-CRPCF604 alive 
crucs502-cnat-cnat error

Step 2. Recover the VMs from ESC.

Try to recover from ESC manually.

Spoiler

Note: Recovery might take up to 900s (15 min) to complete.

bootup_time 300
recovery_wait_time 600

Last login: Wed May 13 02:07:42 2020 from 10.x.x.x

####################################################################
# ESC on crucs502-esc-vnf-esc-core-esc-1 is in MASTER state.
####################################################################


[admin@crucs502-esc-vnf-esc-core-esc-1 ~]$ health.sh
============== ESC HA (MASTER) with DRBD =================
vimmanager (pgid 14643) is running
monitor (pgid 14712) is running
mona (pgid 14768) is running
drbd (pgid 0) is master
snmp is disabled at startup
etsi is disabled at startup
pgsql (pgid 15119) is running
keepalived (pgid 14070) is running
portal is disabled at startup
confd (pgid 15016) is running
filesystem (pgid 0) is running
escmanager (pgid 15254) is running
=======================================
ESC HEALTH PASSED


/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO crucs502-cnat-cn_oam1_0_d7f90c1e-4401-4be9-87f6-f39ecf04ea3a

tail -50f /var/log/esc/yangesc.log

2020-05-05 02:29:01.534 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 02:29:01.534 WARN Type: VM_RECOVERY_COMPLETE
2020-05-05 02:29:01.534 WARN Status: SUCCESS
2020-05-05 02:29:01.534 WARN Status Code: 200
2020-05-05 02:29:01.534 WARN Status Msg: Recovery: Successfully recovered VM [crucs502-cnat-cn_oam1_0_d7f90c1e-4401-4be9-87f6-f39ecf04ea3a].
2020-05-05 02:29:01.534 WARN Tenant: core
2020-05-05 02:29:01.534 WARN Deployment name: crucs502-cnat-cnat-core
2020-05-05 02:29:01.534 WARN VM group name: oam1
<output trimmed>

/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO crucs502-cnat-cn_master_0_05487525-c86f-47e1-a07e-fd33720d114f

tail -50f /var/log/esc/yangesc.log

2020-05-05 02:12:51.512 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 02:12:51.512 WARN Type: VM_RECOVERY_COMPLETE
2020-05-05 02:12:51.512 WARN Status: SUCCESS
2020-05-05 02:12:51.512 WARN Status Code: 200
2020-05-05 02:12:51.512 WARN Status Msg: Recovery: Successfully recovered VM [crucs502-cnat-cn_master_0_05487525-c86f-47e1-a07e-fd33720d114f].
2020-05-05 02:12:51.512 WARN Tenant: core
2020-05-05 02:12:51.512 WARN Deployment name: crucs502-cnat-cnat-core
<output trimmed>

Step 3. Check if the Recovery is Successful.

Check the yangesc logs (tail -50f /var/log/esc/yangesc.log) and look for Status and Recovery as shown above. If it is successful, navigate to the confd cli and verify.

[admin@crucs502-esc-vnf-esc-core-esc-1 ~]$ /opt/cisco/esc/confd/bin/confd_cli -u admin -C

admin connected from 10.249.80.137 using ssh on crucs502-esc-vnf-esc-core-esc-1


crucs502-esc-vnf-esc-core-esc-1# show esc_datamodel opdata tenants tenant | select deployments state_machine
NAME DEPLOYMENT NAME STATE VM NAME STATE 
-------------------------------------------------------------------------------------------------------------------------------------------
<trucated output>
crucs502-cnat-cn_etcd2_0_7263c87c-ee62-4b81-8e1e-a0f5c463a5b5 VM_ALIVE_STATE 
crucs502-cnat-cn_etcd3_0_512ef3c0-96a2-4a10-83b0-4c7d13805856 VM_ALIVE_STATE 
crucs502-cnat-cn_master_0_05487525-c86f-47e1-a07e-fd33720d114f VM_ALIVE_STATE 
crucs502-cnat-cn_master_0_8cf66daa-9dfe-4c7e-817e-36624f9c98c2 VM_ALIVE_STATE 
crucs502-cnat-cn_master_0_dff4ad36-7982-4131-a737-ccb6c8eae348 VM_ALIVE_STATE 
crucs502-cnat-cn_oam1_0_d7f90c1e-4401-4be9-87f6-f39ecf04ea3a VM_ALIVE_STATE

Successful Recovery

When ESC shows VM_ALIVE_STATE, verify the status in UAME

crucs502-uame-1#show vnfr state
VNFR ID STATE 
---------------------------------
crucs502-4g-CRPCF504 alive 
crucs502-4g-CRPCF505 alive 
crucs502-4g-CRPCF506 alive 
crucs502-4g-CRPCF507 alive 
crucs502-4g-CRPCF604 alive 
crucs502-4g-CRPCF605 alive 
crucs502-4g-CRPCF606 alive 
crucs502-4g-CRPCF607 alive 
crucs502-4g-CRPGW502 alive 
crucs502-4g-CRPGW503 alive 
crucs502-4g-CRPGW608 alive 
crucs502-4g-CRPGW609 alive 
crucs502-4g-CRPGW610 alive 
crucs502-4g-CRPGW611 alive 
crucs502-4g-CRPGW612 alive 
crucs502-4g-CRPGW613 alive 
crucs502-4g-CRPGW614 alive 
crucs502-4g-CRPGW615 alive 
crucs502-4g-CRSGW606 alive 
crucs502-4g-CRSGW607 alive 
crucs502-4g-CRSGW608 alive 
crucs502-4g-CRSGW609 alive 
crucs502-4g-CRSGW610 alive 
crucs502-4g-CRSGW611 alive 
crucs502-5g-upf-CRUPF014 alive 
crucs502-5g-upf-CRUPF015 alive 
crucs502-5g-upf-CRUPF016 alive 
crucs502-5g-upf-CRUPF017 alive 
crucs502-5g-upf-CRUPF018 alive 
crucs502-5g-upf-CRUPF019 alive 
crucs502-5g-upf-CRUPF020 alive 
crucs502-5g-upf-CRUPF021 alive 
crucs502-5g-upf-CRUPF022 alive 
crucs502-5g-upf-CRUPF023 alive 
crucs502-5g-upf-CRUPF024 alive 
crucs502-5g-upf-CRUPF025 alive 
crucs502-5g-upf-CRUPF026 alive 
crucs502-5g-upf-CRUPF027 alive 
crucs502-cnat-cnat alive 
crucs502-cnat-smi-cm alive 
crucs502-esc-vnf-esc alive 


verify the same in openstack (source the correct overcloud rc file)

(crucs502) [stack@crucs502-ospd ~]$ nova list --fields name,status,host |egrep "CRPCF507|cnat"

<truncated output>
| 3eb43fe7-9f41-42d8-afe4-80f6fd62c385 | crucs502-4g-CRPCF507-core-CRPCF5071 | ACTIVE | crucs502-compute-11.localdomain |
| cc678283-2967-4404-a714-e4dd78000e82 | crucs502-cnat-cnat-core-etcd1 | ACTIVE | crucs502-osd-compute-0.localdomain |
| 711d6fcd-b816-49d4-a702-e993765757b0 | crucs502-cnat-cnat-core-master3 | ACTIVE | crucs502-osd-compute-3.localdomain |
| 46f64bde-a8db-48f2-bf3d-fe3b01295f2f | crucs502-cnat-cnat-core-oam1 | ACTIVE | crucs502-osd-compute-3.localdomain |
| f470ba3d-813e-434b-aac8-78bc646fda22 | crucs502-cnat-cnat-core-oam2 | ACTIVE | crucs502-osd-compute-2.localdomain |

Failed Recovery

This example shows a case of recovery failure from ESC. In this case, VM is re-deployed from UAME.

/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7

This output shows the failure message in yangesc.log

tail -50f /var/log/esc/yangesc.log

2020-05-05 02:57:21.143 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 02:57:21.143 WARN Type: VM_RECOVERY_INIT
2020-05-05 02:57:21.143 WARN Status: SUCCESS
2020-05-05 02:57:21.143 WARN Status Code: 200
2020-05-05 02:57:21.143 WARN Status Msg: Recovery event for VM Generated ID [crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7] triggered.
2020-05-05 02:57:21.143 WARN Tenant: core
2020-05-05 02:57:21.143 WARN Deployment name: crucs502-4g-CRPCF507-core
2020-05-05 02:57:21.143 WARN VM group name: CRPCF5071
<output trimmed>


2020-05-05 02:57:21.144 WARN ===== SEND NOTIFICATION ENDS =====
2020-05-05 03:09:21.655 WARN 
2020-05-05 03:09:21.655 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 03:09:21.655 WARN Type: VM_RECOVERY_REBOOT
2020-05-05 03:09:21.655 WARN Status: SUCCESS
2020-05-05 03:09:21.655 WARN Status Code: 200
2020-05-05 03:09:21.655 WARN Status Msg: VM Generated ID [crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7] is rebooted.
2020-05-05 03:09:21.655 WARN Tenant: core
2020-05-05 03:09:21.655 WARN Deployment name: crucs502-4g-CRPCF507-core
2020-05-05 03:09:21.655 WARN VM group name: CRPCF5071
<output trimmed>


2020-05-05 03:09:21.656 WARN ===== SEND NOTIFICATION ENDS =====
2020-05-05 03:14:22.079 WARN 
2020-05-05 03:14:22.079 WARN ===== SEND NOTIFICATION STARTS =====
2020-05-05 03:14:22.079 WARN Type: VM_RECOVERY_COMPLETE
2020-05-05 03:14:22.079 WARN Status: FAILURE
2020-05-05 03:14:22.079 WARN Status Code: 500
2020-05-05 03:14:22.079 WARN Status Msg: Recovery: Recovery completed with errors for VM: [crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7]
2020-05-05 03:14:22.079 WARN Tenant: core
2020-05-05 03:14:22.079 WARN Deployment name: crucs502-4g-CRPCF507-core
2020-05-05 03:14:22.079 WARN VM group name: CRPCF5071
<output trimmed>

Check the Recovery Policy of the ESC

In ESC, the recovery method is reboot only. This shows that the VM could not be brought back with a reboot, Need to redeploy.

crucs502-esc-vnf-esc-core-esc-1# show running-config | include recovery_policy 
recovery_policy recovery_type AUTO
recovery_policy action_on_recovery REBOOT_ONLY
recovery_policy max_retries 1

Re-confirm the VM status in UAME

Spoiler

Important Note: Redeploy comes with day0 config. Day1 config needs to be loaded separately.

ubuntu@crucs502-uame-1:~$ /opt/cisco/usp/uas/confd-6.3.8/bin/confd_cli -u admin -C
Enter Password for 'admin': 
elcome to the ConfD CLI
admin connected from 10.249.80.137 using ssh on crucs502-uame-1
crucs502-uame-1#

crucs502-uame-1#
crucs502-uame-1#show vnfr state
VNFR ID STATE 
---------------------------------
crucs502-4g-CRPCF504 alive 
crucs502-4g-CRPCF505 alive 
crucs502-4g-CRPCF506 alive 
crucs502-4g-CRPCF507 error
crucs502-4g-CRPCF604 alive

Step 4. Redeploy the VMs.

Redeploy VMs from UAME

crucs502-uame-1# recover nsd-id crucs502-4g vnfd CRPCF507 recovery-action redeploy

Inspect Logs of Redeployment

View UAME logs as well as ESC logs, This whole process may take up to 15 minutes.

UAME logs:

tail -50f /var/log/upstart /uame/log

<truncated output>
2020-05-06 08:57:22,252 - | VM_RECOVERY_DEPLOYED | CRPCF5071 | SUCCESS | Waiting for: VM_RECOVERY_COMPLETE|
2020-05-06 08:57:22,255 - Timing out in 143 seconds
2020-05-06 08:57:48,227 - | VM_RECOVERY_COMPLETE | crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7 | SUCCESS | (1/1)
2020-05-06 08:57:48,229 - NETCONF transaction completed successfully!
2020-05-06 08:57:48,231 - Released lock: esc_vnf_req
2020-05-06 08:57:48,347 - Deployment recover-vnf-deployment: crucs502-4g succeeded
2020-05-06 08:57:48,354 - Send Deployment notification for: crucs502-4g-CRPCF507

ESC Logs:

tail -50f /var/log/esc/yangesc.log

2020-05-06 08:58:01.454 WARN Type: VM_RECOVERY_COMPLETE
2020-05-06 08:58:01.454 WARN Status: SUCCESS
2020-05-06 08:58:01.454 WARN Status Code: 200
2020-05-06 08:58:01.454 WARN Status Msg: Recovery: Successfully recovered VM [crucs502-4g-CRPC_CRPCF5_0_ee07bf60-a8f8-405f-9a0d-cfa7363e32e7].
2020-05-06 08:58:01.454 WARN Tenant: core
2020-05-06 08:58:01.454 WARN Deployment ID: 4f958c43-dfa4-45d4-a69d-76289620c337
2020-05-06 08:58:01.454 WARN Deployment name: crucs502-4g-CRPCF507-core
2020-05-06 08:58:01.454 WARN VM group name: CRPCF5071
<output trimmed>

Verify the status of the VM. Follow the procedure in step3.

Contributed by Cisco Engineers

Adityan Arathi
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Ultra Cloud Core - Subscriber Microservices Infrastructure

CNAT VMs, CUPS VMs and 5G-UPF VMs Recovery through UAME

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Prerequisites

Requirements

Components used

Background Information

Ultra Automation and Monitoring Engine (UAME)

Elastic Services Controller (ESC)

Problem

Procedure to Recover VMs

Step 1. Check the VM status from UAME.

Step 2. Recover the VMs from ESC.

Step 3. Check if the Recovery is Successful.

Successful Recovery

Failed Recovery

Check the Recovery Policy of the ESC

Step 4. Redeploy the VMs.

Redeploy VMs from UAME

Inspect Logs of Redeployment

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products