Replacement of Compute Server UCS C240 M4 - vEPC

Available Languages

Download Options

PDF (129.9 KB)
View with Adobe Reader on a variety of devices
ePub (1.0 MB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (643.2 KB)
View on Kindle device or Kindle app on multiple devices

Updated:July 19, 2021

Document ID:213355

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

Background Information

Identify the VMs Hosted in the Compute Node

Graceful Power Off

Case 1. Compute Node Hosts only SF VM

Migrate SF Card to Standby State

Shutdown SF VM from ESC

Remove the Compute Node from Nova Aggregate List

Case 2. Compute Node Hosts CF/ESC/EM/UAS

Migrate CF Card to Standby State

Shutdown CF and EM VM from ESC

Migrate ESC to Standby Mode

Remove the Compute Node from Nova Aggregate List

Compute Node Deletion

Delete Compute Node from the Service List

Delete Neutron Agents

Delete from the Ironic Database

Delete from Overcloud

Install the New Compute Node

Add the New Compute Node to the Overcloud

Post Server Replacement Settings

Restore the VMs

Case 1. Compute Node Hosts only SF VM

Addition to Nova Aggregate List

SF VM Recovery from ESC

Case 2. Compute Node Hosts CF, ESC, EM and UAS

Addition to Nova Aggregate List

Recovery of UAS VM

Recovery of ESC VM

Handle ESC Recovery Failure

Auto-Deploy Configuration Update

Enabling Syslogs

Related Information

Introduction

This document describes the steps required to replace a faulty compute server in an Ultra-M setup that hosts StarOS Virtual Network Functions (VNFs).

Background Information

Ultra-M is a pre-packaged and validated virtualized mobile packet core solution designed to simplify the deployment of VNFs. OpenStack is the Virtualized Infrastructure Manager (VIM) for Ultra-M and consists of these node types:

Compute
Object Storage Disk - Compute (OSD - Compute)
Controller
OpenStack Platform - Director (OSPD)

The high-level architecture of Ultra-M and the components involved are depicted in this image:

UltraM Architecture

This document is intended for the Cisco personnel familiar with Cisco Ultra-M platform and it details the steps required to be carried out at OpenStack and StarOS VNF level at the time of the Compute Server Replacement.

Note: Ultra M 5.1.x release is considered in order to define the procedures in this document.

Abbreviations

VNF	Virtual Network Function
CF	Control Function
SF	Service Function
ESC	Elastic Service Controller
MOP	Method of Procedure
OSD	Object Storage Disks
HDD	Hard Disk Drive
SSD	Solid State Drive
VIM	Virtual Infrastructure Manager
VM	Virtual Machine
EM	Element Manager
UAS	Ultra Automation Services
UUID	Universally Unique IDentifier

Workflow of the MoP

Highlevel Workflow of the Replacement Procedure

Prerequisites

Backup

Before you replace a Compute node, it is important to check the current state of your Red Hat OpenStack Platform environment. It is recommended you check the current state in order to avoid complications when the Compute replacement process is on. It can be achieved by this flow of replacement.

In case of recovery, Cisco recommends to take a backup of the OSPD database with the use of these steps:

[root@director ~]# mysqldump --opt --all-databases > /root/undercloud-all-databases.sql
[root@director ~]# tar --xattrs -czf undercloud-backup-`date +%F`.tar.gz /root/undercloud-all-databases.sql 
/etc/my.cnf.d/server.cnf /var/lib/glance/images /srv/node /home/stack
tar: Removing leading `/' from member names

This process ensures that a node can be replaced without affecting the availability of any instances. Also, it is recommended to backup the StarOS configuration especially if the compute node to be replaced hosts the Control Function (CF) Virtual Machine (VM).

Identify the VMs Hosted in the Compute Node

Identify the VMs that are hosted on the compute server. There can be two possibilities:

The compute server contains only Service Function (SF) VM:

[stack@director ~]$ nova list --field name,host | grep compute-10
| 49ac5f22-469e-4b84-badc-031083db0533 |  VNF2-DEPLOYM_s9_0_8bc6cc60-15d6-4ead-8b6a-10e75d0e134d     |  
pod1-compute-10.localdomain    |

The compute server contains Control Function (CF)/Elastic Services Controller (ESC)/ Element Manager (EM)/ Ultra Automation Services (UAS) combination of VMs:

[stack@director ~]$ nova list --field name,host | grep compute-8
| 507d67c2-1d00-4321-b9d1-da879af524f8 | VNF2-DEPLOYM_XXXX_0_c8d98f0f-d874-45d0-af75-88a2d6fa82ea | pod1-compute-8.localdomain     |
| f9c0763a-4a4f-4bbd-af51-bc7545774be2 | VNF2-DEPLOYM_c1_0_df4be88d-b4bf-4456-945a-3812653ee229     | pod1-compute-8.localdomain     |
| 75528898-ef4b-4d68-b05d-882014708694 | VNF2-ESC-ESC-0                                             | pod1-compute-8.localdomain     |
| f5bd7b9c-476a-4679-83e5-303f0aae9309 | VNF2-UAS-uas-0                                             | pod1-compute-8.localdomain     |

Note: In the output shown here, the first column corresponds to the Universally Unique IDentifier (UUID), the second column is the VM name and the third column is the hostname where the VM is present. The parameters from this output will be used in subsequent sections.

Graceful Power Off

Case 1. Compute Node Hosts only SF VM

Migrate SF Card to Standby State

Log in to the StarOS VNF and identify the card that corresponds to the SF VM. Use the UUID of the SF VM identified from the section "Identify the VMs hosted in the Compute Node", and identify the card that corresponds to the UUID:

[local]VNF2# show card hardware
Tuesday might 08 16:49:42 UTC 2018
<snip>
Card 8:
  Card Type               : 4-Port Service Function Virtual Card
  CPU Packages            : 26 [#0, #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19, #20, #21, #22, #23, #24, #25]
  CPU Nodes               : 2
  CPU Cores/Threads       : 26
  Memory                  : 98304M (qvpc-di-large)
  UUID/Serial Number      :  49AC5F22-469E-4B84-BADC-031083DB0533
<snip>

Check the status of the card:

[local]VNF2# show card table
Tuesday might 08 16:52:53 UTC 2018
Slot         Card Type                               Oper State     SPOF  Attach
-----------  --------------------------------------  -------------  ----  ------
 1: CFC      Control Function Virtual Card           Active         No         
 2: CFC      Control Function Virtual Card           Standby        -          
 3: FC       4-Port Service Function Virtual Card    Active         No         
 4: FC       4-Port Service Function Virtual Card    Active         No         
 5: FC       4-Port Service Function Virtual Card    Active         No         
 6: FC       4-Port Service Function Virtual Card    Active         No         
 7: FC       4-Port Service Function Virtual Card    Active         No         
8: FC       4-Port Service Function Virtual Card    Active         No         
 9: FC       4-Port Service Function Virtual Card    Active         No         
10: FC       4-Port Service Function Virtual Card    Standby        -

If the card is in the active state, move the card to standby state:

  [local]VNF2# card migrate from 8 to 10

Shutdown SF VM from ESC

[admin@VNF2-esc-esc-0 ~]$ cd /opt/cisco/esc/esc-confd/esc-cli
[admin@VNF2-esc-esc-0 esc-cli]$ ./esc_nc_cli get esc_datamodel | egrep --color "<state>|<vm_name>|<vm_id>|<deployment_name>"
<snip>
<state>SERVICE_ACTIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_c1_0_df4be88d-b4bf-4456-945a-3812653ee229</vm_name>
                    <state>VM_ALIVE_STATE</state>
                    <vm_name> VNF2-DEPLOYM_s9_0_8bc6cc60-15d6-4ead-8b6a-10e75d0e134d</vm_name>
                    <state>VM_ALIVE_STATE</state>
<snip>

Stop the SF VM with the use of its VM Name. (VM Name noted from section " Identify the VMs hosted in the Compute Node"):

[admin@VNF2-esc-esc-0 esc-cli]$ ./esc_nc_cli vm-action STOP VNF2-DEPLOYM_s9_0_8bc6cc60-15d6-4ead-8b6a-10e75d0e134d

Once it is stopped, the VM must enter the SHUTOFF state:

[admin@VNF2-esc-esc-0 ~]$ cd /opt/cisco/esc/esc-confd/esc-cli
[admin@VNF2-esc-esc-0 esc-cli]$ ./esc_nc_cli get esc_datamodel | egrep --color "<state>|<vm_name>|<vm_id>|<deployment_name>"
<snip>
<state>SERVICE_ACTIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_c1_0_df4be88d-b4bf-4456-945a-3812653ee229</vm_name>
                    <state>VM_ALIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_c3_0_3e0db133-c13b-4e3d-ac14-
                    <state>VM_ALIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_s9_0_8bc6cc60-15d6-4ead-8b6a-10e75d0e134d</vm_name>
                    <state>VM_SHUTOFF_STATE</state>
<snip>

Remove the Compute Node from Nova Aggregate List

List the nova aggregates and identify the aggregate that corresponds to the compute server based on the VNF hosted by it. Usually, it would be of the format <VNFNAME>-SERVICE<X>:

[stack@director ~]$ nova aggregate-list
+----+-------------------+-------------------+
| Id | Name              | Availability Zone |
+----+-------------------+-------------------+
| 29 | POD1-AUTOIT   | mgmt              |
| 57 | VNF1-SERVICE1 | -                 |
| 60 | VNF1-EM-MGMT1 | -                 |
| 63 | VNF1-CF-MGMT1 | -                 |
| 66 | VNF2-CF-MGMT2 | -                 |
| 69 | VNF2-EM-MGMT2 | -                 |
| 72 | VNF2-SERVICE2 | -                 |
| 75 | VNF3-CF-MGMT3 | -                 |
| 78 | VNF3-EM-MGMT3 | -                 |
| 81 | VNF3-SERVICE3 | -                 |
+----+-------------------+-------------------+

In this case, the compute server to be replaced belongs to VNF2. So, the corresponding aggregate-list will be VNF2-SERVICE2.

Remove the compute node from the aggregate identified (remove by hostname noted from Section "Identify the VMs hosted in the Compute Node"):

nova aggregate-remove-host <Aggregate> <Hostname>

[stack@director ~]$ nova aggregate-remove-host VNF2-SERVICE2 pod1-compute-10.localdomain

Verify if the compute node has been removed from the aggregates. Now, the Host must not be listed under the aggregate:

nova aggregate-show <aggregate-name>

[stack@director ~]$ nova aggregate-show VNF2-SERVICE2

Case 2. Compute Node Hosts CF/ESC/EM/UAS

Migrate CF Card to Standby State

Log in to the StarOS VNF and identify the card that corresponds to the CF VM. Use the UUID of the CF VM identified from the section "Identify the VMs hosted in the Compute Node", and find the card that corresponds to the UUID:

[local]VNF2# show card hardware
Tuesday might 08 16:49:42 UTC 2018
<snip>
Card 2:
  Card Type               : Control Function Virtual Card
  CPU Packages            : 8 [#0, #1, #2, #3, #4, #5, #6, #7]
  CPU Nodes               : 1
  CPU Cores/Threads       : 8
  Memory                  : 16384M (qvpc-di-large)
  UUID/Serial Number      : F9C0763A-4A4F-4BBD-AF51-BC7545774BE2
<snip>

Check the status of the card:

[local]VNF2# show card table
Tuesday might 08 16:52:53 UTC 2018
Slot         Card Type                               Oper State     SPOF  Attach
-----------  --------------------------------------  -------------  ----  ------
 1: CFC      Control Function Virtual Card           Standby        -
 2: CFC      Control Function Virtual Card           Active         No          
 3: FC       4-Port Service Function Virtual Card    Active         No         
 4: FC       4-Port Service Function Virtual Card    Active         No         
 5: FC       4-Port Service Function Virtual Card    Active         No         
 6: FC       4-Port Service Function Virtual Card    Active         No         
 7: FC       4-Port Service Function Virtual Card    Active         No         
 8: FC       4-Port Service Function Virtual Card    Active         No         
 9: FC       4-Port Service Function Virtual Card    Active         No         
10: FC       4-Port Service Function Virtual Card    Standby        -

If the card is in the active state, move the card to standby state:

[local]VNF2# card migrate from 2 to 1

Shutdown CF and EM VM from ESC

[admin@VNF2-esc-esc-0 ~]$ cd /opt/cisco/esc/esc-confd/esc-cli
[admin@VNF2-esc-esc-0 esc-cli]$ ./esc_nc_cli get esc_datamodel | egrep --color "<state>|<vm_name>|<vm_id>|<deployment_name>"
<snip>
<state>SERVICE_ACTIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_c1_0_df4be88d-b4bf-4456-945a-3812653ee229</vm_name>
                    <state>VM_ALIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_c3_0_3e0db133-c13b-4e3d-ac14-
                    <state>VM_ALIVE_STATE</state>
<deployment_name>VNF2-DEPLOYMENT-em</deployment_name>
                  <vm_id>507d67c2-1d00-4321-b9d1-da879af524f8</vm_id>
                  <vm_id>dc168a6a-4aeb-4e81-abd9-91d7568b5f7c</vm_id>
                  <vm_id>9ffec58b-4b9d-4072-b944-5413bf7fcf07</vm_id>
                <state>SERVICE_ACTIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_XXXX_0_c8d98f0f-d874-45d0-af75-88a2d6fa82ea</vm_name>
                    <state>VM_ALIVE_STATE</state>
<snip>

Stop the CF and EM VM one-by-one with the use of its VM Name. (VM Name noted from section "Identify the VMs hosted in the Compute Node"):

[admin@VNF2-esc-esc-0 esc-cli]$ ./esc_nc_cli vm-action STOP VNF2-DEPLOYM_c1_0_df4be88d-b4bf-4456-945a-3812653ee229

[admin@VNF2-esc-esc-0 esc-cli]$ ./esc_nc_cli vm-action STOP VNF2-DEPLOYM_XXXX_0_c8d98f0f-d874-45d0-af75-88a2d6fa82ea

After it stops, the VMs must enter the SHUTOFF state:

[admin@VNF2-esc-esc-0 ~]$ cd /opt/cisco/esc/esc-confd/esc-cli
[admin@VNF2-esc-esc-0 esc-cli]$ ./esc_nc_cli get esc_datamodel | egrep --color "<state>|<vm_name>|<vm_id>|<deployment_name>"
<snip>
<state>SERVICE_ACTIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_c1_0_df4be88d-b4bf-4456-945a-3812653ee229</vm_name>
                    <state>VM_SHUTOFF_STATE</state>
                    <vm_name>VNF2-DEPLOYM_c3_0_3e0db133-c13b-4e3d-ac14-
                    <state>VM_ALIVE_STATE</state>
<deployment_name>VNF2-DEPLOYMENT-em</deployment_name>
                  <vm_id>507d67c2-1d00-4321-b9d1-da879af524f8</vm_id>
                  <vm_id>dc168a6a-4aeb-4e81-abd9-91d7568b5f7c</vm_id>
                  <vm_id>9ffec58b-4b9d-4072-b944-5413bf7fcf07</vm_id>
                <state>SERVICE_ACTIVE_STATE</state>
                    <vm_name>VNF2-DEPLOYM_XXXX_0_c8d98f0f-d874-45d0-af75-88a2d6fa82ea</vm_name>
                    <state>VM_SHUTOFF_STATE</state>
<snip>

Migrate ESC to Standby Mode

Log in to the ESC hosted in the compute node and check if it is in the master state. If yes, switch the ESC to standby mode:

[admin@VNF2-esc-esc-0 esc-cli]$ escadm status
0 ESC status=0 ESC Master Healthy


[admin@VNF2-esc-esc-0 ~]$ sudo service keepalived stop
Stopping keepalived:                                       [  OK  ]

[admin@VNF2-esc-esc-0 ~]$ escadm status
1 ESC status=0 In SWITCHING_TO_STOP state. Please check status after a while.

[admin@VNF2-esc-esc-0 ~]$ sudo reboot
Broadcast message from admin@vnf1-esc-esc-0.novalocal
       (/dev/pts/0) at 13:32 ...
The system is going down for reboot NOW!

Remove the Compute Node from Nova Aggregate List

List the nova aggregates and identify the aggregate that corresponds to the compute server based on the VNF hosted by it. Usually, it would be of the format <VNFNAME>-EM-MGMT<X> and <VNFNAME>-CF-MGMT<X>

[stack@director ~]$ nova aggregate-list
+----+-------------------+-------------------+
| Id | Name              | Availability Zone |
+----+-------------------+-------------------+
| 29 | POD1-AUTOIT   | mgmt              |
| 57 | VNF1-SERVICE1 | -                 |
| 60 | VNF1-EM-MGMT1 | -                 |
| 63 | VNF1-CF-MGMT1 | -                 |
| 66 | VNF2-CF-MGMT2 | -                 |
| 69 | VNF2-EM-MGMT2 | -                 |
| 72 | VNF2-SERVICE2 | -                 |
| 75 | VNF3-CF-MGMT3 | -                 |
| 78 | VNF3-EM-MGMT3 | -                 |
| 81 | VNF3-SERVICE3 | -                 |
+----+-------------------+-------------------+

In our case, the compute server belongs to VNF2. So, the aggregates that correspond would be VNF2-CF-MGMT2 and VNF2-EM-MGMT2.

Remove the compute node from the aggregate identified:

nova aggregate-remove-host <Aggregate> <Host>

[stack@director ~]$ nova aggregate-remove-host VNF2-CF-MGMT2 pod1-compute-8.localdomain
[stack@director ~]$ nova aggregate-remove-host VNF2-EM-MGMT2 pod1-compute-8.localdomain

Verify if the compute node has been removed from the aggregates. Now, ensure that the Host is not listed under the aggregates:

nova aggregate-show <aggregate-name>

[stack@director ~]$ nova aggregate-show VNF2-CF-MGMT2
[stack@director ~]$ nova aggregate-show  VNF2-EM-MGMT2

Compute Node Deletion

The steps mentioned in this section are common irrespective of the VMs hosted in the compute node.

Delete Compute Node from the Service List

Delete the compute service from the service list:

[stack@director ~]$ source corerc
[stack@director ~]$ openstack compute service list | grep compute-8
| 404 | nova-compute     | pod1-compute-8.localdomain     | nova     | enabled | up    | 2018-05-08T18:40:56.000000 |

openstack compute service delete <ID>
[stack@director ~]$ openstack compute service delete 404

Delete Neutron Agents

Delete the old associated neutron agent and open vswitch agent for the compute server:

[stack@director ~]$ openstack network agent list | grep compute-8
| c3ee92ba-aa23-480c-ac81-d3d8d01dcc03 | Open vSwitch agent | pod1-compute-8.localdomain     | None              | False  | UP    | neutron-openvswitch-agent |
| ec19cb01-abbb-4773-8397-8739d9b0a349 | NIC Switch agent   | pod1-compute-8.localdomain     | None              | False  | UP    | neutron-sriov-nic-agent   |

openstack network agent delete <ID>

[stack@director ~]$ openstack network agent delete c3ee92ba-aa23-480c-ac81-d3d8d01dcc03
[stack@director ~]$ openstack network agent delete ec19cb01-abbb-4773-8397-8739d9b0a349

Delete from the Ironic Database

Delete a node from the ironic database and verify it:

[stack@director ~]$ source stackrc

nova show <compute-node> | grep hypervisor

[stack@director ~]$ nova show pod1-compute-10 | grep hypervisor
| OS-EXT-SRV-ATTR:hypervisor_hostname  | 4ab21917-32fa-43a6-9260-02538b5c7a5a

ironic node-delete <ID>

[stack@director ~]$ ironic node-delete 4ab21917-32fa-43a6-9260-02538b5c7a5a 
[stack@director ~]$ ironic node-list (node delete must not be listed now)

Delete from Overcloud

Create a script file named delete_node.sh with the contents as shown. Ensure that the templates mentioned are same as the ones used in the deploy.sh script used for the stack deployment:

 delete_node.sh

 openstack overcloud node delete --templates -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml -e /home/stack/custom-templates/layout.yaml --stack <stack-name> <UUID>

[stack@director ~]$ source stackrc
[stack@director ~]$ /bin/sh delete_node.sh
+ openstack overcloud node delete --templates -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml -e /home/stack/custom-templates/layout.yaml --stack pod1 49ac5f22-469e-4b84-badc-031083db0533
Deleting the following nodes from stack pod1:
- 49ac5f22-469e-4b84-badc-031083db0533
Started Mistral Workflow. Execution ID: 4ab4508a-c1d5-4e48-9b95-ad9a5baa20ae

real   0m52.078s
user   0m0.383s
sys    0m0.086s

Wait for the OpenStack stack operation to move to the COMPLETE state:

[stack@director ~]$  openstack stack list
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 5df68458-095d-43bd-a8c4-033e68ba79a0 | pod1 | UPDATE_COMPLETE | 2018-05-08T21:30:06Z | 2018-05-08T20:42:48Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+

Install the New Compute Node

The steps in order to install a new UCS C240 M4 server and the initial setup steps can be referred from:

Cisco UCS C240 M4 Server Installation and Service Guide

After the installation of the server, insert the hard disks in the respective slots as the old server

Perform BIOS upgrade if the firmware is not as per the recommended version used previously. Steps for BIOS upgrade are given here:

Cisco UCS C-Series Rack-Mount Server BIOS Upgrade Guide

Verify the status of Physical drives. It must be “Unconfigured Good”:

Storage > Cisco 12G SAS Modular Raid Controller (SLOT-HBA) > Physical Drive Info

Create a virtual drive from the physical drives with RAID Level 1:

Storage > Cisco 12G SAS Modular Raid Controller (SLOT-HBA) > Controller Info > Create Virtual Drive from Unused Physical Drives

Select the VD and configure “Set as Boot Drive”:

Enable IPMI over LAN:

Admin > Communication Services > Communication Services

Disable hyperthreading:

Compute > BIOS > Configure BIOS > Advanced > Processor Configuration

Note: The image shown here and the configuration steps mentioned in this section are with reference to the firmware version 3.0(3e) and there might be slight variations if you work on other versions.

Add the New Compute Node to the Overcloud

The steps mentioned in this section are common irrespective of the VM hosted by the compute node.

Add Compute server with a different index

Create an add_node.json file with only the details of the new compute server to be added. Ensure that the index number for the new compute server has not been used before. Typically, increment the next highest compute value.

Example: Highest prior was compute-17, therefore, created compute-18 in case of 2-vnf system.

Note: Be mindful of the json format.

[stack@director ~]$ cat add_node.json 
{
    "nodes":[
        {
            "mac":[
                "<MAC_ADDRESS>"
            ],
            "capabilities": "node:compute-18,boot_option:local",
            "cpu":"24",
            "memory":"256000",
            "disk":"3000",
            "arch":"x86_64",
            "pm_type":"pxe_ipmitool",
            "pm_user":"admin",
            "pm_password":"<PASSWORD>",
            "pm_addr":"192.100.0.5"
        }
    ]
}

Import the json file:

[stack@director ~]$ openstack baremetal import --json add_node.json
Started Mistral Workflow. Execution ID: 78f3b22c-5c11-4d08-a00f-8553b09f497d
Successfully registered node UUID 7eddfa87-6ae6-4308-b1d2-78c98689a56e
Started Mistral Workflow. Execution ID: 33a68c16-c6fd-4f2a-9df9-926545f2127e
Successfully set all nodes to available.

Run node introspection with the use of the UUID noted from the previous step:

[stack@director ~]$ openstack baremetal node manage 7eddfa87-6ae6-4308-b1d2-78c98689a56e
[stack@director ~]$ ironic node-list |grep 7eddfa87
| 7eddfa87-6ae6-4308-b1d2-78c98689a56e | None | None                                 | power off   | manageable         | False       |

[stack@director ~]$ openstack overcloud node introspect 7eddfa87-6ae6-4308-b1d2-78c98689a56e --provide
Started Mistral Workflow. Execution ID: e320298a-6562-42e3-8ba6-5ce6d8524e5c
Waiting for introspection to finish...
Successfully introspected all nodes.
Introspection completed.
Started Mistral Workflow. Execution ID: c4a90d7b-ebf2-4fcb-96bf-e3168aa69dc9
Successfully set all nodes to available.

[stack@director ~]$ ironic node-list |grep available
| 7eddfa87-6ae6-4308-b1d2-78c98689a56e | None | None                                 | power off   | available          | False       |

Execute deploy.sh script that was previously used to deploy the stack, in order to add the new compute node to the overcloud stack:

[stack@director ~]$ ./deploy.sh
++ openstack overcloud deploy --templates -r /home/stack/custom-templates/custom-roles.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml --stack ADN-ultram --debug --log-file overcloudDeploy_11_06_17__16_39_26.log --ntp-server 172.24.167.109 --neutron-flat-networks phys_pcie1_0,phys_pcie1_1,phys_pcie4_0,phys_pcie4_1 --neutron-network-vlan-ranges datacentre:1001:1050 --neutron-disable-tunneling --verbose --timeout 180
…
Starting new HTTP connection (1): 192.200.0.1
"POST /v2/action_executions HTTP/1.1" 201 1695
HTTP POST http://192.200.0.1:8989/v2/action_executions 201
Overcloud Endpoint: http://10.1.2.5:5000/v2.0
Overcloud Deployed
clean_up DeployOvercloud: 
END return value: 0

real   38m38.971s
user   0m3.605s
sys    0m0.466s

Wait for the openstack stack status to be Complete:

[stack@director ~]$  openstack stack list
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 5df68458-095d-43bd-a8c4-033e68ba79a0 | ADN-ultram | UPDATE_COMPLETE | 2017-11-02T21:30:06Z | 2017-11-06T21:40:58Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+

Check that new compute node is in the Active state:

[stack@director ~]$ source stackrc
[stack@director ~]$ nova list |grep compute-18
| 0f2d88cd-d2b9-4f28-b2ca-13e305ad49ea | pod1-compute-18    | ACTIVE | -          | Running     | ctlplane=192.200.0.117 |

[stack@director ~]$ source corerc
[stack@director ~]$ openstack hypervisor list |grep compute-18
| 63 | pod1-compute-18.localdomain    |

Post Server Replacement Settings

After adding the server to the overcloud, please refer to the link below to apply the settings that were previously present in the old server:

Restore the VMs

Case 1. Compute Node Hosts only SF VM

Addition to Nova Aggregate List

Add the compute node to the aggregate-host and verify if the host has been added:

nova aggregate-add-host <Aggregate> <Host>
[stack@director ~]$ nova aggregate-add-host VNF2-SERVICE2 pod1-compute-18.localdomain

nova aggregate-show <Aggregate>
[stack@director ~]$ nova aggregate-show VNF2-SERVICE2

SF VM Recovery from ESC

The SF VM would be in error state in the nova list:

[stack@director  ~]$ nova list |grep VNF2-DEPLOYM_s9_0_8bc6cc60-15d6-4ead-8b6a-10e75d0e134d
| 49ac5f22-469e-4b84-badc-031083db0533 | VNF2-DEPLOYM_s9_0_8bc6cc60-15d6-4ead-8b6a-10e75d0e134d     | ERROR  | -          | NOSTATE     |

Recover the SF VM from the ESC:

[admin@VNF2-esc-esc-0 ~]$ sudo /opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO VNF2-DEPLOYM_s9_0_8bc6cc60-15d6-4ead-8b6a-10e75d0e134d
[sudo] password for admin: 

Recovery VM Action
/opt/cisco/esc/confd/bin/netconf-console --port=830 --host=127.0.0.1 --user=admin --privKeyFile=/root/.ssh/confd_id_dsa --privKeyType=dsa --rpc=/tmp/esc_nc_cli.ZpRCGiieuW
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
  <ok/>
</rpc-reply>

Monitor the yangesc.log:

admin@VNF2-esc-esc-0 ~]$ tail -f /var/log/esc/yangesc.log
…
14:59:50,112 07-Nov-2017 WARN  Type: VM_RECOVERY_COMPLETE
14:59:50,112 07-Nov-2017 WARN  Status: SUCCESS
14:59:50,112 07-Nov-2017 WARN  Status Code: 200
14:59:50,112 07-Nov-2017 WARN  Status Msg: Recovery: Successfully recovered VM [VNF2-DEPLOYM_s9_0_8bc6cc60-15d6-4ead-8b6a-10e75d0e134d].

Ensure that the SF card comes up as standby SF in the VNF

Case 2. Compute Node Hosts CF, ESC, EM and UAS

Addition to Nova Aggregate List

Add the compute node to the aggregate-hosts and verify if the host has been added. In this case, the compute node must be added to both the CF and EM host aggregates.

nova aggregate-add-host <Aggregate> <Host>
[stack@director ~]$ nova aggregate-add-host VNF2-CF-MGMT2 pod1-compute-18.localdomain
[stack@director ~]$ nova aggregate-add-host VNF2-EM-MGMT2 pod1-compute-18.localdomain

nova aggregate-show <Aggregate>
[stack@director ~]$ nova aggregate-show VNF2-CF-MGMT2
[stack@director ~]$ nova aggregate-show VNF2-EM-MGMT2

Recovery of UAS VM

Check the status of the UAS VM in the nova list and delete it:

[stack@director ~]$ nova list | grep VNF2-UAS-uas-0
| 307a704c-a17c-4cdc-8e7a-3d6e7e4332fa | VNF2-UAS-uas-0                                                 | ACTIVE | -          | Running     | VNF2-UAS-uas-orchestration=172.168.11.10; VNF2-UAS-uas-management=172.168.10.3
[stack@tb5-ospd ~]$ nova delete VNF2-UAS-uas-0
Request to delete server VNF2-UAS-uas-0 has been accepted.

In order to recover the autovnf-uas VM, run the uas-check script to check state. It must report an error. Then run again with --fix option to recreate the missing UAS VM:

[stack@director ~]$ cd /opt/cisco/usp/uas-installer/scripts/
[stack@director scripts]$ ./uas-check.py auto-vnf VNF2-UAS
2017-12-08 12:38:05,446 - INFO: Check of AutoVNF cluster started
2017-12-08 12:38:07,925 - INFO: Instance 'vnf1-UAS-uas-0' status is 'ERROR'
2017-12-08 12:38:07,925 - INFO: Check completed, AutoVNF cluster has recoverable errors

[stack@director scripts]$ ./uas-check.py auto-vnf VNF2-UAS --fix
2017-11-22 14:01:07,215 - INFO: Check of AutoVNF cluster started
2017-11-22 14:01:09,575 - INFO: Instance VNF2-UAS-uas-0' status is 'ERROR'
2017-11-22 14:01:09,575 - INFO: Check completed, AutoVNF cluster has recoverable errors
2017-11-22 14:01:09,778 - INFO: Removing instance VNF2-UAS-uas-0'
2017-11-22 14:01:13,568 - INFO: Removed instance VNF2-UAS-uas-0'
2017-11-22 14:01:13,568 - INFO: Creating instance VNF2-UAS-uas-0' and attaching volume ‘VNF2-UAS-uas-vol-0'
2017-11-22 14:01:49,525 - INFO: Created instance ‘VNF2-UAS-uas-0'

VNF2-autovnf-uas-0#show uas
uas version 1.0.1-1
uas state ha-active
uas ha-vip 172.17.181.101
INSTANCE IP   STATE  ROLE
-----------------------------------
172.17.180.6  alive  CONFD-SLAVE
172.17.180.7  alive  CONFD-MASTER
172.17.180.9  alive  NA

Note: If uas-check.py --fix fails, you might need to copy this file and run again.

[stack@director ~]$ mkdir –p /opt/cisco/usp/apps/auto-it/common/uas-deploy/
[stack@director ~]$ cp /opt/cisco/usp/uas-installer/common/uas-deploy/userdata-uas.txt /opt/cisco/usp/apps/auto-it/common/uas-deploy/

Recovery of ESC VM

Check the status of the ESC VM from the nova list and delete it:

stack@director scripts]$ nova list |grep ESC-1
| c566efbf-1274-4588-a2d8-0682e17b0d41 | VNF2-ESC-ESC-1                                                 | ACTIVE | -          | Running     | VNF2-UAS-uas-orchestration=172.168.11.14; VNF2-UAS-uas-management=172.168.10.4                                                                                                 |
[stack@director scripts]$ nova delete VNF2-ESC-ESC-1
Request to delete server VNF2-ESC-ESC-1 has been accepted.

From AutoVNF-UAS, find the ESC deployment transaction and in the log for the transaction find the boot_vm.py command line for creating the ESC instance:

ubuntu@VNF2-uas-uas-0:~$ sudo -i
root@VNF2-uas-uas-0:~# confd_cli -u admin -C
Welcome to the ConfD CLI    
admin connected from 127.0.0.1 using console on VNF2-uas-uas-0
VNF2-uas-uas-0#show transaction
TX ID                                 TX TYPE          DEPLOYMENT ID    TIMESTAMP                         STATUS
-----------------------------------------------------------------------------------------------------------------------------
35eefc4a-d4a9-11e7-bb72-fa163ef8df2b  vnf-deployment   VNF2-DEPLOYMENT  2017-11-29T02:01:27.750692-00:00  deployment-success
73d9c540-d4a8-11e7-bb72-fa163ef8df2b  vnfm-deployment  VNF2-ESC         2017-11-29T01:56:02.133663-00:00  deployment-success


VNF2-uas-uas-0#show logs 73d9c540-d4a8-11e7-bb72-fa163ef8df2b | display xml
<config xmlns="http://tail-f.com/ns/config/1.0">
  <logs xmlns="http://www.cisco.com/usp/nfv/usp-autovnf-oper">
    <tx-id>73d9c540-d4a8-11e7-bb72-fa163ef8df2b</tx-id>
    <log>2017-11-29 01:56:02,142 - VNFM Deployment RPC triggered for deployment: VNF2-ESC, deactivate: 0
2017-11-29 01:56:02,179 - Notify deployment
..
2017-11-29 01:57:30,385 - Creating VNFM 'VNF2-ESC-ESC-1' with [python //opt/cisco/vnf-staging/bootvm.py VNF2-ESC-ESC-1 --flavor VNF2-ESC-ESC-flavor --image 3fe6b197-961b-4651-af22-dfd910436689 --net VNF2-UAS-uas-management --gateway_ip 172.168.10.1 --net VNF2-UAS-uas-orchestration --os_auth_url http://10.1.2.5:5000/v2.0 --os_tenant_name core --os_username ****** --os_password ****** --bs_os_auth_url http://10.1.2.5:5000/v2.0 --bs_os_tenant_name core --bs_os_username ****** --bs_os_password ****** --esc_ui_startup false --esc_params_file /tmp/esc_params.cfg --encrypt_key ****** --user_pass ****** --user_confd_pass ****** --kad_vif eth0 --kad_vip 172.168.10.7 --ipaddr 172.168.10.6 dhcp --ha_node_list 172.168.10.3 172.168.10.6 --file root:0755:/opt/cisco/esc/esc-scripts/esc_volume_em_staging.sh:/opt/cisco/usp/uas/autovnf/vnfms/esc-scripts/esc_volume_em_staging.sh --file root:0755:/opt/cisco/esc/esc-scripts/esc_vpc_chassis_id.py:/opt/cisco/usp/uas/autovnf/vnfms/esc-scripts/esc_vpc_chassis_id.py --file root:0755:/opt/cisco/esc/esc-scripts/esc-vpc-di-internal-keys.sh:/opt/cisco/usp/uas/autovnf/vnfms/esc-scripts/esc-vpc-di-internal-keys.sh

Save the boot_vm.py line to a shell script file (esc.sh) and update all the username ***** and password ***** lines with the correct information (typically core/<PASSWORD>). You need to remove the –encrypt_key option as well. For user_pass and user_confd_pass, you need to use the format – username: password (example - admin:<PASSWORD>).

Find the URL to bootvm.py from running-config and wget the bootvm.py file to the autovnf-uas VM. In this case, 10.1.2.3 is the Auto-IT VM's IP:

root@VNF2-uas-uas-0:~# confd_cli -u admin -C
Welcome to the ConfD CLI
admin connected from 127.0.0.1 using console on VNF2-uas-uas-0
VNF2-uas-uas-0#show running-config autovnf-vnfm:vnfm
…
configs bootvm
  value http:// 10.1.2.3:80/bundles/5.1.7-2007/vnfm-bundle/bootvm-2_3_2_155.py
!

root@VNF2-uas-uas-0:~# wget http://10.1.2.3:80/bundles/5.1.7-2007/vnfm-bundle/bootvm-2_3_2_155.py
--2017-12-01 20:25:52--  http://10.1.2.3 /bundles/5.1.7-2007/vnfm-bundle/bootvm-2_3_2_155.py
Connecting to 10.1.2.3:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 127771 (125K) [text/x-python]
Saving to: ‘bootvm-2_3_2_155.py’
100%[=====================================================================================>] 127,771  --.-K/s   in 0.001s
2017-12-01 20:25:52 (173 MB/s) - ‘bootvm-2_3_2_155.py’ saved [127771/127771]

Create a /tmp/esc_params.cfg file:

root@VNF2-uas-uas-0:~# echo "openstack.endpoint=publicURL" > /tmp/esc_params.cfg

Execute shell script to deploy ESC from the UAS node:

root@VNF2-uas-uas-0:~# /bin/sh esc.sh
+ python ./bootvm.py VNF2-ESC-ESC-1 --flavor VNF2-ESC-ESC-flavor --image 3fe6b197-961b-4651-af22-dfd910436689
 --net VNF2-UAS-uas-management --gateway_ip 172.168.10.1 --net VNF2-UAS-uas-orchestration --os_auth_url 
http://10.1.2.5:5000/v2.0 --os_tenant_name core --os_username core --os_password <PASSWORD> --bs_os_auth_url 
http://10.1.2.5:5000/v2.0 --bs_os_tenant_name core --bs_os_username core --bs_os_password <PASSWORD> 
--esc_ui_startup false --esc_params_file /tmp/esc_params.cfg --user_pass admin:<PASSWORD> --user_confd_pass 
admin:<PASSWORD> --kad_vif eth0 --kad_vip 172.168.10.7 --ipaddr 172.168.10.6 dhcp --ha_node_list 172.168.10.3
172.168.10.6 --file root:0755:/opt/cisco/esc/esc-scripts/esc_volume_em_staging.sh:/opt/cisco/usp/uas/autovnf/vnfms/esc-scripts/esc_volume_em_staging.sh 
--file root:0755:/opt/cisco/esc/esc-scripts/esc_vpc_chassis_id.py:/opt/cisco/usp/uas/autovnf/vnfms/esc-scripts/esc_vpc_chassis_id.py 
--file root:0755:/opt/cisco/esc/esc-scripts/esc-vpc-di-internal-keys.sh:/opt/cisco/usp/uas/autovnf/vnfms/esc-scripts/esc-vpc-di-internal-keys.sh

ubuntu@VNF2-uas-uas-0:~$ ssh admin@172.168.11.14
…
   ####################################################################
   #   ESC on VNF2-esc-esc-1.novalocal is in BACKUP state.
   ####################################################################

[admin@VNF2-esc-esc-1 ~]$ escadm status
0 ESC status=0 ESC Backup Healthy

[admin@VNF2-esc-esc-1 ~]$ health.sh
============== ESC HA (BACKUP) ===================================================
ESC HEALTH PASSED

Recover CF and EM VMs from ESC

Check the status of the CF and EM VMs from the nova list. They must be in the ERROR state

[stack@director ~]$ source corerc
[stack@director ~]$ nova list --field name,host,status |grep -i err   
| 507d67c2-1d00-4321-b9d1-da879af524f8 | VNF2-DEPLOYM_XXXX_0_c8d98f0f-d874-45d0-af75-88a2d6fa82ea | None                                 | ERROR|
| f9c0763a-4a4f-4bbd-af51-bc7545774be2 | VNF2-DEPLOYM_c1_0_df4be88d-b4bf-4456-945a-3812653ee229     |None                                 | ERROR

Log in to ESC Master, run recovery-vm-action for each impacted EM and CF VM. Be patient. ESC would schedule the recovery-action and it might not happen for a few minutes. Monitor the yangesc.log:

sudo /opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO <VM-Name>

[admin@VNF2-esc-esc-0 ~]$ sudo /opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO VNF2-DEPLOYMENT-_VNF2-D_0_a6843886-77b4-4f38-b941-74eb527113a8
[sudo] password for admin: 

Recovery VM Action
/opt/cisco/esc/confd/bin/netconf-console --port=830 --host=127.0.0.1 --user=admin --privKeyFile=/root/.ssh/confd_id_dsa --privKeyType=dsa --rpc=/tmp/esc_nc_cli.ZpRCGiieuW
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
  <ok/>
</rpc-reply>

[admin@VNF2-esc-esc-0 ~]$ tail -f /var/log/esc/yangesc.log
…
14:59:50,112 07-Nov-2017 WARN  Type: VM_RECOVERY_COMPLETE
14:59:50,112 07-Nov-2017 WARN  Status: SUCCESS
14:59:50,112 07-Nov-2017 WARN  Status Code: 200
14:59:50,112 07-Nov-2017 WARN  Status Msg: Recovery: Successfully recovered VM [VNF2-DEPLOYMENT-_VNF2-D_0_a6843886-77b4-4f38-b941-74eb527113a8]

ubuntu@VNF2vnfddeploymentem-1:~$ /opt/cisco/ncs/current/bin/ncs_cli -u admin -C
admin connected from 172.17.180.6 using ssh on VNF2vnfddeploymentem-1
admin@scm# show ems
EM            VNFM
ID  SLA  SCM  PROXY
---------------------
2   up   up   up
3   up   up   up

Log into the StarOS VNF and verify that the CF card is in the standby state

Handle ESC Recovery Failure

In cases where ESC fails to start the VM due to an unexpected state, Cisco recommends to perform an ESC switchover by rebooting the Master ESC. The ESC switchover would take about a minute. Run the script “health.sh” on the new Master ESC to check if the status is up. Master ESC to start the VM and fix the VM state. This recovery task would take up to 5 minutes to complete.

You can monitor /var/log/esc/yangesc.log and /var/log/esc/escmanager.log. If you do NOT see VM getting recovered after 5-7 minutes, the user would need to go and do the manual recovery of the impacted VM(s).

Auto-Deploy Configuration Update

From AutoDeploy VM, edit the autodeploy.cfg and replace the old compute server with the new one. Then load replace in confd_cli. This step is required for successful deployment deactivation later.

root@auto-deploy-iso-2007-uas-0:/home/ubuntu# confd_cli -u admin -C
Welcome to the ConfD CLI
admin connected from 127.0.0.1 using console on auto-deploy-iso-2007-uas-0
auto-deploy-iso-2007-uas-0#config
Entering configuration mode terminal
auto-deploy-iso-2007-uas-0(config)#load replace autodeploy.cfg
Loading.     14.63 KiB parsed in 0.42 sec (34.16 KiB/sec)

auto-deploy-iso-2007-uas-0(config)#commit
Commit complete.
auto-deploy-iso-2007-uas-0(config)#end

Restart uas-confd and autodeploy services after the config change.

root@auto-deploy-iso-2007-uas-0:~# service uas-confd restart
uas-confd stop/waiting
uas-confd start/running, process 14078

root@auto-deploy-iso-2007-uas-0:~# service uas-confd status
uas-confd start/running, process 14078

root@auto-deploy-iso-2007-uas-0:~# service autodeploy restart
autodeploy stop/waiting
autodeploy start/running, process 14017
root@auto-deploy-iso-2007-uas-0:~# service autodeploy status
autodeploy start/running, process 14017

Enabling Syslogs

To enable the syslogs for the UCS Server, Openstack components and the recovered VMs, please follow the sections

"Re-Enable syslog for UCS and Openstack components" and "Enable syslog for the VNFs" in the link below:

Related Information

Revision History

Revision	Publish Date	Comments
1.0	01-Jun-2018	Initial Release

Contributed by Cisco Engineers

Partheeban Rajagopal
Cisco Advanced Services
Padmaraj Ramanoudjam
Cisco Advanced Services

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

Replacement of Compute Server UCS C240 M4 - vEPC

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Background Information

Abbreviations

Workflow of the MoP

Prerequisites

Backup

Identify the VMs Hosted in the Compute Node

Graceful Power Off

Case 1. Compute Node Hosts only SF VM

Migrate SF Card to Standby State

Shutdown SF VM from ESC

Remove the Compute Node from Nova Aggregate List

Case 2. Compute Node Hosts CF/ESC/EM/UAS

Migrate CF Card to Standby State

Shutdown CF and EM VM from ESC

Migrate ESC to Standby Mode

Remove the Compute Node from Nova Aggregate List

Compute Node Deletion

Delete Compute Node from the Service List

Delete Neutron Agents

Delete from the Ironic Database

Delete from Overcloud

Install the New Compute Node

Add the New Compute Node to the Overcloud

Post Server Replacement Settings

Restore the VMs

Case 1. Compute Node Hosts only SF VM

Addition to Nova Aggregate List

SF VM Recovery from ESC

Case 2. Compute Node Hosts CF, ESC, EM and UAS

Addition to Nova Aggregate List

Recovery of UAS VM

Recovery of ESC VM

Handle ESC Recovery Failure

Auto-Deploy Configuration Update

Enabling Syslogs

Related Information

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products