簡介
本文說明如何將Ultra自動化和監控引擎(UAME)從UAME問題中的記憶體洩漏中恢復- CSCvu73187
問題
Ultra M運行狀況監視器上的彈性服務控制器(ESC)警報:
[root@pod1-ospd ~]# cat /var/log/cisco/ultram-health/*.report | grep -i xxx
10.10.10.10/vnf-esc | esc | XXX | vnf-esc:(error)
解決方案
狀態檢查
步驟1.登入到OpenStack Platform Director(OSP-D)並驗證vnf-esc錯誤。
[stack@pod1-ospd ~]$ cat /var/log/cisco/ultram-health/*.report | grep -i xxx
[stack@pod1-ospd ~]$ cat /var/log/cisco/ultram-health/*.report | grep -iv ':-)'
步驟2.確認無法通過管理IP 10.241.179.116登入到UAME,但IP是ping的:
(pod1) [stack@pod1-ospd ~]$ ssh ubuntu@10.10.10.10
ssh_exchange_identification: read: Connection reset by peer
(pod1) [stack@pod1-ospd ~]$ ping -c 5 10.10.10.10
PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.
64 bytes from 10.10.10.10: icmp_seq=1 ttl=57 time=0.242 ms
64 bytes from 10.10.10.10: icmp_seq=2 ttl=57 time=0.214 ms
64 bytes from 10.10.10.10: icmp_seq=3 ttl=57 time=0.240 ms
64 bytes from 10.10.10.10: icmp_seq=4 ttl=57 time=0.255 ms
64 bytes from 10.10.10.10: icmp_seq=5 ttl=57 time=0.240 ms
--- 10.10.10.10 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.214/0.238/0.255/0.016 ms
步驟3.確認與ESC和UAME相關的VM處於活動狀態並在OSP-D上運行。
[stack@pod1-ospd ~]$ source *core
(pod1) [stack@pod1-ospd ~]$
(pod1) [stack@pod1-ospd ~]$ nova list --field name,status,host,instance_name,power_state | grep esc
| 31416ffd-0719-4ce5-9e99-a1234567890e | pod1-uame-1 | ACTIVE | - | Running | pod1-AUTOMATION-ORCH=172.16.180.15; pod1-AUTOMATION-MGMT=172.16.181.33 |
| d6830e97-bd82-4d8e-9467-a1234567890e | pod1-uame-2 | ACTIVE | - | Running | pod1-AUTOMATION-ORCH=172.16.180.8; pod1-AUTOMATION-MGMT=172.16.181.12
(pod1) [stack@pod1-ospd ~]$ nova list --field name,status,host,instance_name,power_state | grep uame
| 0c1596bc-e50f-4374-9098-a1234567890e | pod1-esc-vnf-esc-core-esc-1 | ACTIVE | - | Running | pod1-AUTOMATION-ORCH=172.16.180.10; pod1-AUTOMATION-MGMT=172.16.181.10 |
| 3875618d-dcbe-4748-b196-a1234567890e | pod1-esc-vnf-esc-core-esc-2 | ACTIVE | - | Running | pod1-AUTOMATION-ORCH=172.16.180.18; pod1-AUTOMATION-MGMT=172.16.181.5
步驟4.確認您可以連線到主和備用ESC。驗證是否也通過了ESC運行狀況。
[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ cat /opt/cisco/esc/keepalived_state
[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ health.sh
============== ESC HA with DRBD =================
vimmanager (pgid 14654) is running
monitor (pgid 14719) is running
mona (pgid 14830) is running
snmp is disabled at startup
etsi is disabled at startup
pgsql (pgid 15130) is running
keepalived (pgid 13083) is running
portal is disabled at startup
confd (pgid 15027) is running
filesystem (pgid 0) is running
escmanager (pgid 15316) is running
=======================================
ESC HEALTH PASSED
[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ ssh admin@172.16.180.12
####################################################################
# ESC on pod1-esc-vnf-esc-core-esc-2 is in BACKUP state.
####################################################################
[admin@pod1-esc-vnf-esc-core-esc-1 ~]$ cat /opt/cisco/esc/keepalived_state
BACKUP
恢復步驟
步驟1.登入到Pod1-uame-2例項的Horizon Dashboard控制檯。
![](/c/dam/en/us/support/docs/wireless/ultra-cloud-core-subscriber-microservices-infrastructure/217071-recovery-procedure-for-the-uame-memory-a-00.png)
步驟2.從Horizon Dashboard軟重新啟動pod1-uame-2 VM例項。觀察例項的控制檯日誌消息。
步驟3.從Horizon Dashboard的pod1-uame-2 VM例項的控制檯中顯示登入提示後,通過其管理IP 10.10.10.10向UAME啟動SSH
(pod1) [stack@pod1-ospd ~]$ ssh ubuntu@10.10.10.10
附註:只有在此步驟成功的情況下才繼續執行下一步。
步驟4.檢查磁碟空間,特別是主UAME上/dev/vda3文件系統的磁盤空間。
ubuntu@pod1-uame-1:~$ df -kh
步驟5.在主UAME上截斷syslog或syslog.1檔案(兩個檔案中檔案大小較大,通常為MB或GB)。
ubuntu@pod1-uame-1:~$ sudo su -
root@pod1-uame-1:~#
root@pod1-uame-1:~# cd /var/log
root@pod1-uame-1:/var/log# ls -lrth *syslog*
root@pod1-uame-1:/var/log# > syslog.1 or > syslog
步驟6.確保syslog或syslog.1 file-size現在為主要UAME上為0字節。
root@pod1-uame-1:/var/log# ls -lrth *syslog*
步驟7.確保df -kh在主UAME上應該有足夠的可用空間用於檔案系統分割槽。
ubuntu@pod1-uame-1:~$ df -kh
通過SSH連線到輔助UAME。
ubuntu@pod1-uame-1:~$ ssh ubuntu@172.16.180.8
password:
...
ubuntu@pod1-uame-2:~$
步驟8.在輔助UAME上截斷syslog或syslog.1檔案(兩個檔案中檔案大小較大,通常為MB或GB)。
ubuntu@pod1-uame-2:~$ sudo su -
root@pod1-uame-2:~#
root@pod1-uame-2:~# cd /var/log
root@pod1-uame-2:/var/log# ls -lrth *syslog*
root@pod1-uame-2:/var/log# > syslog.1 or > syslog
步驟9.確保syslog或syslog.1 file-size現在在輔助UAME上為0位元組。
root@pod1-uame-2:/var/log# ls -lrth *syslog*
步驟10.確保df -kh具有足夠的可用空間,以便在輔助UAME上進行檔案系統分割槽。
ubuntu@pod1-uame-2:~$ df -kh
恢復狀態檢查之後
步驟1.等待Ultra M運行狀況監控器的至少一個迭代以確認運行狀況報告中未出現vnf-esc錯誤。
[stack@pod1-ospd ~]$ cat /var/log/cisco/ultram-health/*.report | grep -i xxx
[stack@pod1-ospd ~]$ cat /var/log/cisco/ultram-health/*.report | grep -iv ':-)'
步驟2.確認ESC和UAME VM處於活動狀態且在OSPD上運行。
[stack@pod1-ospd ~]$ source *core
(pod1) [stack@pod1-ospd ~]$ nova list --field name,status,host,instance_name,power_state | grep esc
(pod1) [stack@pod1-ospd ~]$ nova list --field name,status,host,instance_name,power_state | grep uame
步驟3.通過SSH進入主要和備份ESC,並確認也通過了ESC運行狀況。
[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ cat /opt/cisco/esc/keepalived_state
[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ health.sh
============== ESC HA with DRBD =================
vimmanager (pgid 14638) is running
monitor (pgid 14703) is running
mona (pgid 14759) is running
snmp is disabled at startup
etsi is disabled at startup
pgsql (pgid 15114) is running
keepalived (pgid 13205) is running
portal is disabled at startup
confd (pgid 15011) is running
filesystem (pgid 0) is running
escmanager (pgid 15300) is running
=======================================
ESC HEALTH PASSED
[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ ssh admin@
admin@172.16.181.26's password:
Last login: Fri May 1 10:28:12 2020 from 172.16.180.13
####################################################################
# ESC on scucs501-esc-vnf-esc-core-esc-2 is in BACKUP state.
####################################################################
[admin@pod1-esc-vnf-esc-core-esc-2 ~]$ cat /opt/cisco/esc/keepalived_state
BACKUP
步驟4.在UAME中確認ESC vnfd處於活動狀態。
ubuntu@pod1-uame-1:~$ sudo su
ubuntu@pod1-uame-1:~$ confd_cli -u admin -C
pod1-uame-1# show vnfr state