Introduction
This document describes the steps required to replace both of the faulty HDD drives in a server in an Ultra-M setup that hosts StarOS Virtual Network Functions (VNFs).
Background Information
Ultra-M is a pre-packaged and validated virtualized mobile packet core solution designed to simplify the deployment of VNFs. OpenStack is the Virtualized Infrastructure Manager (VIM) for Ultra-M and consists of these node types:
- Compute
- Object Storage Disk - Compute (OSD - Compute)
- Controller
- OpenStack Platform - Director (OSPD)
The high-level architecture of Ultra-M and the components involved are depicted in this image:
UltraM Architecture
This document is intended for the Cisco personnel familiar with Cisco Ultra-M platform and it details the steps required to be carried out at OpenStack and CPS VNF level at the time of the Controller Server Replacement.
Note: Ultra M 5.1.x release is considered in order to define the procedures in this document.
Abbreviations
VNF |
Virtual Network Function |
CF |
Control Function |
SF |
Service Function |
ESC |
Elastic Service Controller |
MOP |
Method of Procedure |
OSD |
Object Storage Disks
|
HDD |
Hard Disk Drive |
SSD |
Solid State Drive |
VIM |
Virtual Infrastructure Manager |
VM |
Virtual Machine |
EM |
Element Manager |
UAS |
Ultra Automation Services |
UUID |
Universally Unique IDentifier |
Both HDD Failure
1. Each Baremetal server will be provisioned with two HDD drives to act as BOOT DISK in Raid 1 configuration. In case of single HDD failure, since there is RAID 1 level redundancy, the faulty HDD drive can be hot swapped. But when both the HDD drives fail, the server will be down and the access to the server will be lost. So, to restore the access to the server and the services, it is required to replace both the HDD drives and add the server to the existing overcloud stack.
2. Procedure to replace a faulty component on UCS C240 M4 server can be referred from: Replacing the Server Components
3. In case of both HDD failure, replace only these two faulty HDDs in the same UCS 240M4 server. BIOS upgrade procedure is not required after you replace new disks.
4. In OpenStack based (Ultra-M) solution, UCS 240M4 Baremetal server can take up one of these roles: Compute, OSD-Compute, Controller and OSPD. The steps required in order to handle both HDD failures in each of these server roles are mentioned in these sections.
Note: In scenarios where both HDD disks are healthy but some other hardware is faulty in UCS 240M4 server, replace the UCS 240M4 with new hardware but re-use the same HDD drives. However, in this case, if only the HDD drives are faulty, re-use the same UCS 240M4 and replace faulty HDD drives with new HDD drives.
Both HDD Failure on Compute Server
If the failure of both HDD drives are observed in UCS 240M4 which is acting as compute node, follow the replacement procedure in below link: PCRF-Replacement-of-Compute-Server-UCS-C240-M4
Both HDD Failure on Controller Server
If the failure of both HDD drives is observed in UCS 240M4 which acts as a controller node, follow the replacement procedure from: PCRF-Replacement-of-Controller-Server-UCS-C240-M4
Since controller server that observes both HDD failure will not be reachable via SSH, log in into another controller node to perform the graceful shutdown procedure listed in the above link.
Both HDD Failure on OSD-Compute Server
If the failure of both HDD drives is observed in UCS 240M4 which acts as OSD-Compute node, follow the replacement procedure from: PCRF-Replacement-of-OSD-Compute-UCS-240M4
In the procedure mentioned in this link, Ceph storage graceful shutdown cannot be performed as both the failure results in unreachability of the server. Therefore ignore those steps.
Both HDD Failure on OSPD Server
If the failure of both HDD drives is observed in UCS 240M4 which acts as an OSPD node, follow the replacement procedure from: Replacement-of-OSPD-Server-UCS-240M4-CPS
In this case, you don't need previously stored OSPD backup for restoration after HDD disk replacement, otherwise, it will be like complete stack re-deployment.