Recovery Procedure for the UAME Memory Allocation Issue

Available Languages

Download Options

PDF (50.1 KB)
View with Adobe Reader on a variety of devices
ePub (89.3 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (77.0 KB)
View on Kindle device or Kindle app on multiple devices

Updated:April 28, 2021

Document ID:217071

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

After Recovery Status Check

Introduction

This document describes how to recover the Ultra Automation and Monitoring Engine (UAME) from the Memory Leakage in UAME issue - CSCvu73187

Problem

The Elastic Services Controller (ESC) alarm on the Ultra M health monitor:

[root@pod1-ospd ~]# cat /var/log/cisco/ultram-health/*.report | grep -i xxx
 10.10.10.10/vnf-esc          | esc      | XXX      | vnf-esc:(error)

Solution

Status Check

Step 1. Log in to OpenStack Platform Director (OSP-D) and verify the vnf-esc errors.

[stack@pod1-ospd ~]$ cat /var/log/cisco/ultram-health/*.report | grep -i xxx
[stack@pod1-ospd ~]$ cat /var/log/cisco/ultram-health/*.report | grep -iv ':-)'

Step 2. Confirm that you are unable to log in to UAME via management IP 10.241.179.116 but IP is pingable:

(pod1) [stack@pod1-ospd ~]$ ssh ubuntu@10.10.10.10
ssh_exchange_identification: read: Connection reset by peer
(pod1) [stack@pod1-ospd ~]$ ping -c 5 10.10.10.10
PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.
64 bytes from 10.10.10.10: icmp_seq=1 ttl=57 time=0.242 ms
64 bytes from 10.10.10.10: icmp_seq=2 ttl=57 time=0.214 ms
64 bytes from 10.10.10.10: icmp_seq=3 ttl=57 time=0.240 ms
64 bytes from 10.10.10.10: icmp_seq=4 ttl=57 time=0.255 ms
64 bytes from 10.10.10.10: icmp_seq=5 ttl=57 time=0.240 ms

--- 10.10.10.10 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.214/0.238/0.255/0.016 ms

Step 3. Confirm that VMs related to ESC & UAME are ACTIVE and runs on OSP-D.

[stack@pod1-ospd ~]$ source *core 
(pod1) [stack@pod1-ospd ~]$

(pod1) [stack@pod1-ospd ~]$ nova list --field name,status,host,instance_name,power_state | grep esc
| 31416ffd-0719-4ce5-9e99-a1234567890e | pod1-uame-1 | ACTIVE | - | Running | pod1-AUTOMATION-ORCH=172.16.180.15; pod1-AUTOMATION-MGMT=172.16.181.33 |
| d6830e97-bd82-4d8e-9467-a1234567890e | pod1-uame-2 | ACTIVE | - | Running | pod1-AUTOMATION-ORCH=172.16.180.8; pod1-AUTOMATION-MGMT=172.16.181.12

(pod1) [stack@pod1-ospd ~]$ nova list --field name,status,host,instance_name,power_state | grep uame
| 0c1596bc-e50f-4374-9098-a1234567890e | pod1-esc-vnf-esc-core-esc-1 | ACTIVE | - | Running | pod1-AUTOMATION-ORCH=172.16.180.10; pod1-AUTOMATION-MGMT=172.16.181.10 |
| 3875618d-dcbe-4748-b196-a1234567890e | pod1-esc-vnf-esc-core-esc-2 | ACTIVE | - | Running | pod1-AUTOMATION-ORCH=172.16.180.18; pod1-AUTOMATION-MGMT=172.16.181.5

Step 4. Confirm that you are able to connect to the primary and backup ESC. Verify that the ESC health is also passed.

[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ cat /opt/cisco/esc/keepalived_state

[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ health.sh 
============== ESC HA with DRBD =================
vimmanager (pgid 14654) is running
monitor (pgid 14719) is running
mona (pgid 14830) is running
snmp is disabled at startup
etsi is disabled at startup
pgsql (pgid 15130) is running
keepalived (pgid 13083) is running
portal is disabled at startup
confd (pgid 15027) is running
filesystem (pgid 0) is running
escmanager (pgid 15316) is running
=======================================
ESC HEALTH PASSED

[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ ssh admin@172.16.180.12
####################################################################
# ESC on pod1-esc-vnf-esc-core-esc-2 is in BACKUP state.
####################################################################

[admin@pod1-esc-vnf-esc-core-esc-1 ~]$ cat /opt/cisco/esc/keepalived_state
BACKUP

Recovery Steps

Step 1. Log in to Horizon Dashboard console for pod1-uame-2 instance.

Step 2. Soft Reboot the pod1-uame-2 VM instance from Horizon Dashboard. Observe the console log messages of the instance.

Step 3. Once the login prompt is showed in the console of the pod1-uame-2 VM instance from Horizon Dashboard, initiate SSH into the UAME via its management IP 10.10.10.10

(pod1) [stack@pod1-ospd ~]$ ssh ubuntu@10.10.10.10

Note: Proceed to the next step only if this step was successful.

Step 4. Check the disk space especially for /dev/vda3 filesystem on primary UAME.

ubuntu@pod1-uame-1:~$ df -kh

Step 5. Truncate the syslog or syslog.1 file (larger file-size out of the two files, usually in MB or GB) on primary UAME.

ubuntu@pod1-uame-1:~$ sudo su -
root@pod1-uame-1:~# 
root@pod1-uame-1:~# cd /var/log
root@pod1-uame-1:/var/log# ls -lrth *syslog*
root@pod1-uame-1:/var/log# > syslog.1 or > syslog

Step 6. Ensure that syslog or syslog.1 file-size is now 0 bytes on primary UAME.

root@pod1-uame-1:/var/log# ls -lrth *syslog*

Step 7. Ensure df -kh should have enough free space for filesystem partition on primary UAME.

ubuntu@pod1-uame-1:~$ df -kh

SSH into secondary UAME.

ubuntu@pod1-uame-1:~$ ssh ubuntu@172.16.180.8
password: 
...
ubuntu@pod1-uame-2:~$

Step 8. Truncate the syslog or syslog.1 file (larger file-size out of the two files, usually in MB or GB) on secondary UAME.

ubuntu@pod1-uame-2:~$ sudo su -
root@pod1-uame-2:~# 
root@pod1-uame-2:~# cd /var/log
root@pod1-uame-2:/var/log# ls -lrth *syslog*
root@pod1-uame-2:/var/log# > syslog.1 or > syslog

Step 9. Ensure that syslog or syslog.1 file-size is now 0 bytes on secondary UAME.

root@pod1-uame-2:/var/log# ls -lrth *syslog*

Step 10. Ensure df -kh should have enough free space for filesystem partition on secondary UAME.

ubuntu@pod1-uame-2:~$ df -kh

After Recovery Status Check

Step 1. Wait for at least one iteration of the Ultra M health monitor to confirm no vnf-esc errors seen on health report.

[stack@pod1-ospd ~]$ cat /var/log/cisco/ultram-health/*.report | grep -i xxx
[stack@pod1-ospd ~]$ cat /var/log/cisco/ultram-health/*.report | grep -iv ':-)'

Step 2. Confirm ESC and UAME VMs are ACTIVE and Running on OSPD.

[stack@pod1-ospd ~]$ source *core 
(pod1) [stack@pod1-ospd ~]$ nova list --field name,status,host,instance_name,power_state | grep esc
(pod1) [stack@pod1-ospd ~]$ nova list --field name,status,host,instance_name,power_state | grep uame

Step 3. SSH into the primary and backup ESC and confirm that ESC health is also passed.

[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ cat /opt/cisco/esc/keepalived_state


[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ health.sh 
============== ESC HA with DRBD =================
vimmanager (pgid 14638) is running
monitor (pgid 14703) is running
mona (pgid 14759) is running
snmp is disabled at startup
etsi is disabled at startup
pgsql (pgid 15114) is running
keepalived (pgid 13205) is running
portal is disabled at startup
confd (pgid 15011) is running
filesystem (pgid 0) is running
escmanager (pgid 15300) is running
=======================================
ESC HEALTH PASSED

[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ ssh admin@
admin@172.16.181.26's password: 
Last login: Fri May 1 10:28:12 2020 from 172.16.180.13

####################################################################
# ESC on scucs501-esc-vnf-esc-core-esc-2 is in BACKUP state.
####################################################################

[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ cat /opt/cisco/esc/keepalived_state
BACKUP

Step 4. Confirm in UAME the ESC vnfd is in ALIVE state.

ubuntu@pod1-uame-1:~$ sudo su
ubuntu@pod1-uame-1:~$ confd_cli -u admin -C
pod1-uame-1# show vnfr state

Contributed by Cisco Engineers

Dennis Lanov
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Ultra Cloud Core - Subscriber Microservices Infrastructure

Recovery Procedure for the UAME Memory Allocation Issue

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Problem

Solution

Status Check

Recovery Steps

After Recovery Status Check

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products