Managing Cisco NFVI Pods
You can perform OpenStack management operations on Cisco NFVI pods including addition and removal of Cisco NFVI compute and Ceph nodes, and replacement of controller nodes. Each action is mutually exclusive. You can perform only one pod management action at a time. Before you perform a pod action, ensure that the following requirements are met:
-
The node is part of an existing pod.
-
The node information exists in the setup_data.yaml file, if the pod management task is removal or replacement of a node.
-
The node information does not exist in the setup_data.yaml file, if the pod management task is to add a node.
For more information on operations that can be performed on pods, see the Managing Hosts in Cisco VIM or NFVI Pods section.
General Guidelines for Pod Management
The setup_data.yaml file is the only user-generated configuration file that is used to install and manage the cloud. While many instances of pod management indicate that the setup_data.yaml file is modified, the administrator does not update the system generated setup_data.yaml file directly.
Note |
To avoid translation errors, ensure that you do not copy and paste commands from the documents to the Linux CLI. |
Follow these steps to update the setup_data.yaml file:
-
Copy the setup data into a local directory:
[root@mgmt1 ~]# cd /root/ [root@mgmt1 ~]# mkdir MyDir [root@mgmt1 ~]# cd MyDir [root@mgmt1 ~]# cp /root/openstack-configs/setup_data.yaml <my_setup_data.yaml>
-
Update the setup data manually:
[root@mgmt1 ~]# vi my_setup_data.yaml (update the targeted fields for the setup_data)
-
Run the reconfiguration command:
[root@mgmt1 ~]# ciscovim –-setupfile ~/MyDir/<my_setup_data.yaml> <pod_management_action>
In Cisco VIM, you can edit and enable a selected set of options in the setup_data.yaml file using the reconfigure option. After installation, you can change the values of the feature parameters. Unless specified, Cisco VIM does not allow you to undo the feature configuration.
The following table provides the list of features that you can reconfigure after installing a pod.
Features Enabled after post-pod deployment |
Comment |
---|---|
Optional OpenStack Services |
|
Pod Monitoring |
CVIM-MON: monitoring host and service level with or without ui_access NFVIMON: Third-party monitoring from host to service level with aid of Cisco Advance Services. |
Export of EFK logs to External Syslog Server |
Reduces single point of failure on management node and provides data aggregation. |
NFS for Elasticsearch Snapshot |
NFS mount point for Elastic-search snapshot is used so that the disk on management node does not get full. |
Admin Source Networks |
White list filter for accessing management node admin service over IPv4 or IPv6. |
NFVBench |
Tool to help measure cloud performance. Management node needs a dedicated 10G/40G Intel NIC (4x10G 710, or 2x40G XL710 Intel NIC). |
EFK settings |
Enables you to set EFK rotation frequency and size. |
OpenStack service password |
Implemented for security reasons, so that OpenStack passwords can be reset on-demand. |
CIMC Password Reconfigure Post Install | Implemented for security reasons, so that CIMC passwords for C-series pod, can be reset on-demand. |
SwiftStack Post Install |
Integration with third-party Object-Store. The SwiftStack Post Install feature works only with Keystone v2. |
TENANT_VLAN_RANGES and PROVIDER_VLAN_RANGES |
Ability to increase or decrease the tenant and provider VLAN ranges on a pod that is up and running. It gives customers flexibility in network planning. |
DHCP reservation for VM’s MAC addresses |
Allow DHCP reservation for virtual machine MAC addresses, so as to get the same IP address always regardless of the host hypervisor or operating system they are running. |
Enable TRUSTED_VF on a per (SR-IOV) compute basis |
Allows virtual functions to become trusted by the physical function and to perform some privileged operations such as enabling VF promiscuous mode and changing VF MAC address within the guest. |
Support of ,multiple external syslog servers |
Ability to offload the OpenStack logs to a maximum of four external Syslog servers post-installation. |
Replace of failed APIC Hosts and add more leaf nodes |
Ability to replace failed APIC Hosts, and add more leaf nodes to increase the fabric influence. |
Make Netapp block storage end point secure |
Ability to move the Netapp block storage endpoint from Clear to TLS post-deployment |
Auto-backup of Management Node |
Ability to enable/disable auto-backup of Management Node. It is possible to unconfigure the Management Node. |
VIM Admins |
Ability to configure non-root VIM Administrators. Ability to configure VIM admins authenticated by LDAP. |
EXTERNAL_LB_VIP_FQDN |
Ability to enable TLS on external_vip through FQDN. |
EXTERNAL_LB_VIP_TLS |
Ability to enable TLS on external_vip through an IP address. |
http_proxy and/or https_proxy |
Ability to reconfigure http and/or https proxy servers. |
Admin privileges for VNF Manager (ESC) from a tenant domain |
Ability to enable admin privileges for VNF Manager (ESC) from a tenant domain. |
SRIOV_CARD_TYPE |
Mechanism to switch between 2-X520 and 2-XL710 as an SRIOV option in Cisco VIC NIC settings at a global and per compute level through reconfiguration. In the absence of per compute and global level, X520 card type is set by default. |
NETAPP |
Migrate NETAPP transport protocol from http to https. |
Reset of KVM console passwords for servers |
Aids to recover the KVM console passwords for servers. |
Horizon behind NAT or with DNS alias(es) |
Ability to host Horizon behind NAT or with DNS alias(es) |
Login banner for SSH sessions |
Support of configurable login banner for SSH sessions |
Ability to add Layer 3 BGP session |
Ability to switch BGP sessions from Layer 2 to Layer 3 in the presence of VXLAN configuration. |
Add/remove of head-end-replication option |
Ability to add or remove head-end-replication option, in the presence of VXLAN configuration |
Enabling Cloud settings |
Ability to set horizon and keystone settings as reconfigurable. |
Vault |
Ability to enable vault on day-2. |
Identifying the Install Directory
[root@mgmt1 ~]# cd /root/
[root@mgmt1 ~]# ls –lrt | grep openstack-configs
lrwxrwxrwx. 1 root root 38 Mar 12 21:33 openstack-configs -> /root/installer-<tagid>/openstack-configs
From the output, you can understand that the OpenStack-configs is a symbolic link to the installer directory.
Verify that the REST API server is running from the same installer directory location, by executing the following commands:
# cd installer-<tagid>/tools
#./restapi.py -a status
Status of the REST API Server: active (running) since Thu 2016-08-18 09:15:39 UTC; 9h ago
REST API launch directory: /root/installer-<tagid>/
Managing Hosts in Cisco VIM or NFVI Pods
In Cisco VIM, a node can participate in multiple roles based on the pod type. The following rules apply for hardware management of a node:
-
If a node is a Micropod node that acts as controller, compute, and Ceph, the node can only go through the action of replace controller for its swap. You can perform this action on one node at a time.
-
If a node is a hyper-converged node (that is, acting as both compute and Ceph), the node is treated as a ceph node from hardware management point of view and the node can only go through the action of add or remove of Ceph. This action can be done only on one node at a time.
-
If a node is a standalone compute node, the node can only go through the action of add or remove of compute. You can add or remove multiple nodes at a time, but you cannot operate the pod with zero compute at any given time.
-
If a node is a dedicated controller node, the node can only go through the action of replace controller for its swap. This action can be done only on one node at a time.
-
If a node is a dedicated Ceph node, the node can only go through the action of add or remove of Ceph. This action can be done only on one node at a time and you cannot have a pod with less than two node Ceph at a time.
Based on the prceding rules, to perform hardware management actions on the pod, run the commands specified in the following table. If you log in as root, manually change the directory to /root/installer-xxx to get to the correct working directory for these Cisco NFVI pod commands.
Action |
Steps |
Restrictions |
||
---|---|---|---|---|
Remove block_storage or compute node |
|
You can remove multiple compute nodes and only one storage at a time; The pod must have a minimum of one compute and two storage nodes after the removal action. In Cisco VIM, the number of Ceph OSD nodes vary from 3 to 20. You can remove one OSD node at a time as part of the pod management.
|
||
Add block_storage or compute node |
|
You can add multiple compute nodes and only one storage node at a time. The pod must have a minimum of one compute, and two storage nodes before the addition action. In Cisco VIM the number of ceph OSD nodes can vary from 3 to 20. You can add one OSD node at a time as part of the pod management.
|
||
Replace controller node |
|
You can replace only one controller node at a time. The pod can have a maximum of three controller nodes.
In Cisco VIM, the replace controller node operation is supported in Micro-pod.
|
When you add a compute or storage node to a rack based pod (UCS C-Series or Quanta), you can increase the management/provision address pool. Similarly, for a UCS B-Series pod, you can increase the Cisco IMC pool to provide routing space flexibility for pod networking. Along with server information, these are the only items you can change in the setup_data.yaml file after the pod is deployed. To make changes to the management or provisioning sections and/or CIMC (for UCS B-Series pods) network section, you must not change the existing address block as defined on day 0. You can add only to the existing information by adding new address pool block(s) of address pool as shown in the following example:
NETWORKING:
:
:
networks:
-
vlan_id: 99
subnet: 172.31.231.0/25
gateway: 172.31.231.1
## 'pool' can be defined with single ip or a range of ip
pool:
- 172.31.231.2, 172.31.231.5 -→ IP address pool on Day-0
- 172.31.231.7 to 172.31.231.12 -→ IP address pool ext. on Day-n
- 172.31.231.20
segments:
## CIMC IP allocation. Needs to be an external routable network
- cimc
-
vlan_id: 2001
subnet: 192.168.11.0/25
gateway: 192.168.11.1
rt_prefix: < Local to POD > #optional, only for segment management/provision, storage, tenant and ToR-type NCS-5500
rt_suffix: < Region>:< pod_region_number > #optional, only for segement management/provision, storage, tenant and ToR-type NCS-5500
## 'pool' can be defined with single ip or a range of ip
pool:
- 192.168.11.2 to 192.168.11.5 -→ IP address pool on Day-0
- 192.168.11.7 to 192.168.11.12 → IP address pool on day-n
- 192.168.11.20 → IP address pool on day-n
segments:
## management and provision goes together
- management
- provision
:
:
The IP address pool is the only change allowed in the networking space of the specified networks management/provision and/or CIMC (for B-series). The overall network must have enough address space to accommodate for future enhancement on day-0. After making the changes to servers, roles, and the corresponding address pool, you can execute the add compute/storage CLI shown above to add new nodes to the pod.
For C-series M5 pods, with Cisco NCS 5500 as ToR with splitter cable connection onto the server, along with the server (cimc_ip), and connection (tor_info, dp_tor_info, sriov_tor_info) details, you have to adjust the entry for the splitter_opt_4_10 in respective SWITCHDETAILS for the Cisco NCS 5500 ToR pairs.
For example, to add compute or storage with Cisco NCS 5500 as ToR with splitter cable, add the following entry to the respective Cisco NCS 5500:
TORSWITCHINFO:
CONFIGURE_TORS: true # Mandatory
TOR_TYPE: NCS-5500 # Mandatory
ESI_PREFIX:91.<Pod_number>.<podregion_number>.00.00.00.00 #optional – only for NCS-5500
SWITCHDETAILS: -
hostname: <NCS-5500-1> # hostname of NCS-5500-1
username: admin
password: <ssh_password of NCS-5500-1>
...
splitter_opt_4_10: 'FortyGigE<C/D/X/Y>,HundredGigE<E/F/A/B>, …' # Optional for NCS-5500, only when
splitter is needed on per switch basis (i.e. the peer switch may or may not have the entry)
ESI_PREFIX:91.<Pod_number>.<podregion_number>.00.00.00.00 #optional for NCS-5500 only
To remove a compute or a storage, delete the respective information. To replace the controller, swap the relevant port information from which the splitter cables originate.
Note |
For replace controller, you can change only a subset of the server information. For C-series, you can change the server information such as CIMC IP, CIMC Username, CIMC password, rack_id, and tor_info. For B-series, you can change the rack_id, chassis_id, and blade_id, but not the server hostname and management IP during the operation of replace controller. |
Recovering Cisco NFVI Pods
This section describes the recovery processes for Cisco NFVI control node and the pod that is installed through Cisco VIM. For recovery to succeed, a full Cisco VIM installation must have occurred in the past. Recovery is caused by a failure of one or more of the controller services such as Rabbit MQ, MariaDB, and other services. The management node must be up and running and all the nodes must be accessible through SSH without passwords from the management node. You can also use this procedure to recover from a planned shutdown or accidental power outage.
Cisco VIM supports the following control node recovery command:
# ciscovim cluster-recovery
The control node recovers after the network partition is resolved.
Note |
It may be possible that database sync between controller nodes takes time, which can result in cluster-recovery failure. In that case, wait for some time for the database sync to complete and then re-run cluster-recovery. |
To make sure Nova services are good across compute nodes, execute the following command:
# source /root/openstack-configs/openrc
# nova service-list
To check for the overall cloud status, execute the following command:
# ciscovim cloud-sanity create test all
To view the results of cloud-sanity, use the following command:
#ciscovim cloud-sanity show result all –id <uid of the test >
In case of a complete pod outage, you must follow a sequence of steps to bring the pod back. The first step is to bring up the management node, and check that the management node containers are up and running using the docker ps –a command. After you bring up the management node, bring up all the other pod nodes. Make sure every node is reachable through password-less SSH from the management node. Verify that no network IP changes have occurred. You can get the node SSH IP access information from /root/openstack-config/mercury_servers_info.
Execute the following command sequence:
-
Check the setup_data.yaml file and runtime consistency on the management node:
# cd /root/installer-<tagid>/tools # ciscovim run --perform 1,3 -y
-
Execute the cloud sanity using ciscovim command:
#ciscovim cloud-sanity create test all
-
To view the results of cloud-sanity, use the command
#ciscovim cloud-sanity show result all –id <uid of the test >
-
Check the status of the REST API server and the corresponding directory where it is running:
# cd/root/installer-<tagid>/tools #./restapi.py -a status Status of the REST API Server: active (running) since Thu 2016-08-18 09:15:39 UTC; 9h ago REST API launch directory: /root/installer-<tagid>/
-
If the REST API server is not running from the right installer directory, execute the following to get it running from the correct directory:
# cd/root/installer-<tagid>/tools #./restapi.py -a setup Check if the REST API server is running from the correct target directory #./restapi.py -a status Status of the REST API Server: active (running) since Thu 2016-08-18 09:15:39 UTC; 9h ago REST API launch directory: /root/new-installer-<tagid>/
-
Verify Nova services are good across the compute nodes by executing the following command: # source /root/openstack-configs/openrc # nova service-list
If cloud-sanity fails, execute cluster-recovery (ciscovim cluster-recovery), then re-execute the cloud-sanity and nova service-list steps as listed above.
Recovery of compute and OSD nodes requires network connectivity and reboot so that they can be accessed using SSH without password from the management node.
To shut down, bring the pod down in the following sequence:
-
Shut down all VMs, then all the compute nodes. It should be noted that graceful shut down of VMs is important. Check the VM status from the output of "openstack server list --all-projects", which must show that all VMs are in SHUTOFF State before you proceed.
-
Shut down all compute node (s).
-
Shut down all the storage nodes serially.Before proceeding to next step, ensure that you wait until the storage node shutdown is completed.
-
Shut down all the controllers, but one at a time. Before proceeding to next step, wait for the controller node shutdown to complete.
-
Shut down the management node.
-
Shut down the networking gears.
Note
To shut down a node, SSH to the node or connect to CIMC KVM console and issue the shutdown command# shutdown -h now
Bring the nodes up in reverse order, that is:
-
Bring up the networking gears.
-
Bring up the management node.
-
Bring up the control nodes.
-
Bring up the storage nodes.
-
Wait untill the Ceph health reports are fine and then proceed to next step.
-
Bring up the compute nodes.
In each step, ensure that each node type is completely booted up before you move on to the next node type.
-
Run the cluster recovery command, to bring up the pod post power-outage:
# ciscovim cluster-recovery
Run cloud sanity using the command # ciscovim cloud-sanity
.
cloudpulse run --name docker_check
# ciscovim run -–perform 1,3 -y
Bring up all VMs and validate if they are all up (not in shutdown state). If any of the VMs are in down state, bring them up using the Horizon dashboard.