Ultra-M Isolation and Replacement of Failed Disk from Ceph/Storage Cluster - vEPC

Available Languages

Updated:August 23, 2018

Document ID:213588

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

Background Information

Abbreviations

Workflow of the MoP

Prerequisite Health Checks

Isolation and Removal of Faulty OSD Disk from the Cluster

Replace OSD Disk and Create New VD

Add Back the OSD into the Cluster

Introduction

This document describes the steps required to perform in order to isolate and replace OSD disk from Ceph/Storage cluster hosted on Object Storage Disk (OSD)-Compute in an Ultra-M setup.

Background Information

Ultra-M is a pre-packaged and validated virtualized mobile packet core solution designed to simplify the deployment of VNFs. OpenStack is the Virtualized Infrastructure Manager (VIM) for Ultra-M and consists of these node types:

Compute
OSD - Compute
Controller
OpenStack Platform - Director (OSPD)

The high-level architecture of Ultra-M and the components involved are depicted in this image:

UltraM ArchitectureThis document is intended for the Cisco personnel who are familiar with Cisco Ultra-M platform and it details the steps required to be carried out at OpenStack level at the time of the OSPD Server replacement.

Note: Ultra M 5.1.x release is considered in order to define the procedures in this document.

Abbreviations

VNF	Virtual Network Function
CF	Control Function
SF	Service Function
ESC	Elastic Service Controller
MOP	Method of Procedure
OSD	Object Storage Disks
HDD	Hard Disk Drive
SSD	Solid State Drive
VIM	Virtual Infrastructure Manager
VM	Virtual Machine
EM	Element Manager
UAS	Ultra Automation Services
UUID	Universally Unique IDentifier

Workflow of the MoP

Prerequisite Health Checks

1. Use Ceph-disk list command in order to understand the mapping of OSD to Journal and identify the disk to be isolated and replaced.

[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph-disk list
/dev/sda :
 /dev/sda1 other, iso9660
 /dev/sda2 other, xfs, mounted on /
/dev/sdb :
 /dev/sdb1 ceph journal, for /dev/sdc1
 /dev/sdb3 ceph journal, for /dev/sdd1
 /dev/sdb2 ceph journal, for /dev/sde1
 /dev/sdb4 ceph journal, for /dev/sdf1
/dev/sdc :
 /dev/sdc1 ceph data, active, cluster ceph, osd.1, journal /dev/sdb1
/dev/sdd :
/dev/sdd1 ceph data, active, cluster ceph, osd.7, journal /dev/sdb3
/dev/sde :
 /dev/sde1 ceph data, active, cluster ceph, osd.4, journal /dev/sdb2
/dev/sdf :
 /dev/sdf1 ceph data, active, cluster ceph, osd.10, journal /dev/sdb4

2. Verify the Ceph health and OSD tree mapping before you proceed with the identified OSD disk isolation.

[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph -s
    cluster eb2bb192-b1c9-11e6-9205-525400330666
     health HEALTH_OK
            1 mons down, quorum 0,1 pod1-controller-0,pod1-controller-1
     monmap e1: 3 mons at {pod1-controller-0=11.118.0.10:6789/0,pod1-controller-1=11.118.0.11:6789/0,pod1-controller-2=11.118.0.12:6789/0}
            election epoch 28, quorum 0,1 pod1-controller-0,pod1-controller-1
     osdmap e709: 12 osds: 12 up, 12 in
            flags sortbitwise,require_jewel_osds
      pgmap v941813: 704 pgs, 6 pools, 490 GB data, 163 kobjects
            1470 GB used, 11922 GB / 13393 GB avail
                 704 active+clean
  client io 58580 B/s wr, 0 op/s rd, 7 op/s wr

[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph osd tree
ID WEIGHT   TYPE NAME                   UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 13.07996 root default                                                  
-2  4.35999     host pod1-osd-compute-0                                   
 0  1.09000         osd.0                    up  1.00000          1.00000 
 3  1.09000         osd.3                    up  1.00000          1.00000 
 6  1.09000         osd.6                    up  1.00000          1.00000 
 9  1.09000         osd.9                    up  1.00000          1.00000 
-3        0     host pod1-osd-compute-1                                   
-4  4.35999     host pod1-osd-compute-2                                   
 2  1.09000         osd.2                    up  1.00000          1.00000 
 5  1.09000         osd.5                    up  1.00000          1.00000 
 8  1.09000         osd.8                    up  1.00000          1.00000 
11  1.09000         osd.11                   up  1.00000          1.00000 
-5  4.35999     host pod1-osd-compute-3                                   
 1  1.09000         osd.1                    up  1.00000          1.00000 
 4  1.09000         osd.4                    up  1.00000          1.00000 
 7  1.09000         osd.7                    up  1.00000          1.00000 
10  1.09000         osd.10                   up  1.00000          1.00000

Isolation and Removal of Faulty OSD Disk from the Cluster

1. Disable and stop the OSD process.

[heat-admin@pod1-osd-compute-3 ~]$ sudo systemctl disable ceph-osd@7
[heat-admin@pod1-osd-compute-3 ~]$ sudo systemctl stop ceph-osd@7

2. Mark the OSD out.

[heat-admin@pod1-osd-compute-3 ~]$ sudo su

[root@pod1-osd-compute-3 heat-admin]# ceph osd set noout
set noout

[root@pod1-osd-compute-3 heat-admin]# ceph osd set norebalance 
set norebalance

[root@pod1-osd-compute-3 heat-admin]# ceph osd out 7
marked out osd.7.

Note: Wait for data rebalance to complete and all the PG's to come back to active+clean in order to avoid issues.

3. Confirm if the OSD is marked out and wait for the Ceph rebalance to proceed ahead.

[root@pod1-osd-compute-3 heat-admin]# watch -n1 ceph -s                                                                                                                                                                                                                                      
                  95 active+undersized+degraded+remapped+wait_backfill
                  28 active+recovery_wait+degraded
                   2 active+undersized+degraded+remapped+backfilling
                   1 active+recovering+degraded
                   2 active+undersized+degraded+remapped+backfilling                                                                                                                                                                                                                                                
                   1 active+recovering+degraded
                   2 active+undersized+degraded+remapped+backfilling
                  67 active+undersized+degraded+remapped+wait_backfill                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
                   3 active+undersized+degraded+remapped+backfilling                                                                                                                                                                                                                                                       
                  24 active+undersized+degraded+remapped+wait_backfill
                  22 active+undersized+degraded+remapped+wait_backfill                                                                                                                                                                                                                                      
                   1 active+undersized+degraded+remapped+backfilling
                   8 active+undersized+degraded+remapped+wait_backfill

4. Remove the authentication key for the OSD.

[root@pod1-osd-compute-3 heat-admin]# ceph auth del osd.7
updated

5. Confirm that the keys for OSD.7 are not listed.


[root@pod1-osd-compute-3 heat-admin]# ceph auth list
installed auth entries:

osd.0
        key: AQCgpB5blV9dNhAAzDN1SVdnuJyTN2f7PAdtFw==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQBdwyBbbuD6IBAAcvG+oQOz5vk62faOqv/CEw==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.10
        key: AQCwwyBb7xvHJhAAZKPprXWT7UnvnAXBV9W2rg==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.11
        key: AQDxpB5b9/rGFRAAkcCEkpSN1YZVDdeW+Bho7w==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQCppB5btekoNBAAACoWpDz0VL9bZfyIygDpBQ==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.3
        key: AQC4pB5bBaUlORAAhi3KPzetwvWhYGnerAkAsg==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.4
        key: AQB1wyBbvMIQLRAAXefFVnZxMX6lVtObQt9KoA==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.5
        key: AQDBpB5buKHqOhAAW1Q861qoYqW6fAYHlOxsLg==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.6
        key: AQDQpB5b1BveFxAAfCLM3tvDUSnYneutyTmaEg==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.8
        key: AQDZpB5bd4nlGRAAkkzbmGPnEDAWV0dUhrhE6w==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.9
        key: AQDopB5bKCZPGBAAfYtp1GLA7QIi/YxJa8O1yw==
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQDpmx5bAAAAABAA3hLK8O2tGgaAK+X2Lly5Aw==
        caps: [mds] allow *
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQBDpB5bjR1GJhAAB6CKKxXulve9WIiC6ZGXgA==
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
        key: AQDpmx5bAAAAABAA3hLK8O2tGgaAK+X2Lly5Aw==
        caps: [mon] allow profile bootstrap-osd
client.bootstrap-rgw
        key: AQBDpB5b7OWXHBAAlATmBAOX/QWW+2mLxPqlkQ==
        caps: [mon] allow profile bootstrap-rgw
client.openstack
        key: AQDpmx5bAAAAABAAULxfs9cYG1wkSVTjrtiaDg==
        caps: [mon] allow r
        caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics

7. Remove the OSD from the cluster.

[root@pod1-osd-compute-3 heat-admin]# ceph osd rm 7
removed osd.7

8. Unmount the OSD disk that needs to be replaced.

[root@pod1-osd-compute-3 heat-admin]# umount /var/lib/ceph/osd/ceph-7

9. Unset the noscrub and deepscrub.

[root@pod1-osd-compute-3 heat-admin]# ceph osd unset noscrub
unset noscrub

[root@pod1-osd-compute-3 heat-admin]# ceph osd unset nodeep-scrub
unset nodeep-scrub

10. Verify the Ceph health and wait for health-ok and all the PG's to come back to active+clean.

[root@pod1-osd-compute-3 heat-admin]# ceph -s
    cluster eb2bb192-b1c9-11e6-9205-525400330666
     health HEALTH_WARN
            28 pgs backfill_wait
            4 pgs backfilling
            5 pgs degraded
            5 pgs recovery_wait
            83 pgs stuck unclean
            recovery 1697/516881 objects degraded (0.328%)
            recovery 76428/516881 objects misplaced (14.786%)
            noout,norebalance,sortbitwise,require_jewel_osds flag(s) set
            1 mons down, quorum 0,1 pod1-controller-0,pod1-controller-1
     monmap e1: 3 mons at {pod1-controller-0=11.118.0.10:6789/0,pod1-controller-1=11.118.0.11:6789/0,pod1-controller-2=11.118.0.12:6789/0}
            election epoch 28, quorum 0,1 pod1-controller-0,pod1-controller-1
     osdmap e877: 11 osds: 11 up, 11 in; 193 remapped pgs
            flags noout,norebalance,sortbitwise,require_jewel_osds
      pgmap v942974: 704 pgs, 6 pools, 490 GB data, 163 kobjects
            1470 GB used, 10806 GB / 12277 GB avail
            1697/516881 objects degraded (0.328%)
            76428/516881 objects misplaced (14.786%)
                 511 active+clean
                 156 active+remapped
                  28 active+remapped+wait_backfill
                   5 active+recovery_wait+degraded+remapped
                   4 active+remapped+backfilling
  client io 331 kB/s wr, 0 op/s rd, 56 op/s wr

Replace OSD Disk and Create New VD

1. Remove the faulty drive and replace it with a new drive: Cisco UCS C240 M4 Server Installation and Service Guide.

2. Verify log in to CIMC of the OSD-Compute and check the slot where OSD is replaced and shown in good health.

3. Create a Virtual drive for a new HDD, it must be a fresh HDD without any metadata.

4. Verify that the newly added disk is in the Unconimaged Good state.

Storage > Cisco 12G SAS Modular Raid Controller (SLOT-HBA) > Physical Drive Info

5. Select the Create Virtual Drive from Unused Physical Drives option in order to create the VD.

Storage > Cisco 12G SAS Modular Raid Controller (SLOT-HBA)

6. Use Physical Drive 9 in order to create a new VD and name it as OSD3.

Storage > Cisco 12G SAS Modular Raid Controller (SLOT-HBA) > Controller Info > Create Virtual Drive from Unused Physical Drives

7. Enable IPMI over LAN: Admin > Communication Services > Communication Services.

Enable IPMI over LAN: Admin > Communication Services > Communication Services

8. Disable hyperthreading: Compute > BIOS > Conimage BIOS > Advanced > Processor Configuration.

Disable hyperthreading: Compute > BIOS > Configure BIOS > Advanced > Processor Configuration

Note: The image shown here and the configuration steps mentioned in this section are with reference to the firmware version 3.0(3e) and there might be slight variations if you work on other versions.

Add Back the OSD into the Cluster

1. After a new disk is replaced, execute partprobe in order to discover the new device.

[root@pod1-osd-compute-3 heat-admin]# partprobe 
[root@pod1-osd-compute-3 heat-admin]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 278.5G  0 disk 
|
 -sda1   8:1    0     1M  0 part 
 -sda2   8:2    0 278.5G  0 part /
sdb      8:16   0 446.1G  0 disk 
|
 -sdb1   8:17   0   107G  0 part 
 -sdb2   8:18   0   107G  0 part 
 -sdb3   8:19   0   107G  0 part 
 -sdb4   8:20   0   107G  0 part 
sdc      8:32   0   1.1T  0 disk 
|
 -sdc1   8:33   0   1.1T  0 part /var/lib/ceph/osd/ceph-1
sdd 8:48   0   1.1T  0 disk 
|
 -sdd1   8:49   0   1.1T  0 part 
sde      8:64   0   1.1T  0 disk 
|
 -sde1   8:65   0   1.1T  0 part /var/lib/ceph/osd/ceph-4
sdf      8:80   0   1.1T  0 disk 
|
 -sdf1   8:81   0   1.1T  0 part /var/lib/ceph/osd/ceph-10

2. Find a device that is available on the server.

[root@pod1-osd-compute-3 heat-admin]# fdisk -l

Disk /dev/sda: 299.0 GB, 298999349248 bytes, 583983104 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000b5e87

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048        4095        1024   83  Linux
/dev/sda2   *        4096   583983070   291989487+  83  Linux
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdb: 479.0 GB, 478998953984 bytes, 935544832 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1         2048    224462847    107G  unknown         ceph journal
 2    224462848    448923647    107G  unknown         ceph journal
 3    448923648    673384447    107G  unknown         ceph journal
 4    673384448    897845247    107G  unknown         ceph journal
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdd: 1199.0 GB, 1198999470080 bytes, 2341795840 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1         2048   2341795806    1.1T  unknown         ceph data
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdc: 1199.0 GB, 1198999470080 bytes, 2341795840 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1         2048   2341795806    1.1T  unknown         ceph data
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sde: 1199.0 GB, 1198999470080 bytes, 2341795840 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1         2048   2341795806    1.1T  unknown         ceph data
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdf: 1199.0 GB, 1198999470080 bytes, 2341795840 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1         2048   2341795806    1.1T  unknown         ceph data
[root@pod1-osd-compute-3 heat-admin]#

3. Use Ceph-disk list in order to identify the journal disk partition map.


[root@pod1-osd-compute-3 heat-admin]# ceph-disk list
/dev/sda :
 /dev/sda1 other, iso9660
 /dev/sda2 other, xfs, mounted on /
/dev/sdb :
 /dev/sdb1 ceph journal, for /dev/sdc1
 /dev/sdb3 ceph journal
 /dev/sdb2 ceph journal, for /dev/sde1
 /dev/sdb4 ceph journal, for /dev/sdf1
/dev/sdc :
 /dev/sdc1 ceph data, active, cluster ceph, osd.1, journal /dev/sdb1
/dev/sdd :
 /dev/sdd1 other, xfs
/dev/sde :
/dev/sde1 ceph data, active, cluster ceph, osd.4, journal /dev/sdb2
/dev/sdf :
 /dev/sdf1 ceph data, active, cluster ceph, osd.10, journal /dev/sdb4

Note: In ceph-disk list, output highlighted sde1 is journal partition for sdb2. Check the output of the Ceph-disk list and map the journal disk partition in command for Ceph preparation. As soon as you run below command OSD.7 came up/in and data rebalance(backfill/recovery) will be started.

4. Create the Ceph-disk and add it back to the cluster.

[root@pod1-osd-compute-3 heat-admin]#  ceph-disk --setuser ceph --setgroup ceph prepare --fs-type xfs /dev/sdd /dev/sdb3

prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
Creating new GPT entries.
The operation has completed successfully.
meta-data=/dev/sdd1              isize=2048   agcount=4, agsize=73181055 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=292724219, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=142931, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.



#####Hint###
where - sdd is new drive added as OSD 

where – sdb3 is journal disk partition number

mapping is sdc1 for sdc, sdd1 for sdd, sde1 for sde

sdf1 for sdf (and so on)

5. Activate the Ceph-disks and unset the noscrub and nodeep-scrub flags.

[root@pod1-osd-compute-3 heat-admin]# ceph-disk activate-all
[root@pod1-osd-compute-3 heat-admin]# ceph osd unset noout
unset noout
[root@pod1-osd-compute-3 heat-admin]# ceph osd  unset norebalance
unset norebalance
[root@pod1-osd-compute-3 heat-admin]# ceph osd unset noscrub
unset noscrub
[root@pod1-osd-compute-3 heat-admin]# ceph osd unset nodeep-scrub
unset nodeep-scrub

6. Wait for the rebalance to finish and verify that the health of Ceph and OSD tree are fine.

[root@pod1-osd-compute-3 heat-admin]# watch -n 3 ceph -s

[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph -s
    cluster eb2bb192-b1c9-11e6-9205-525400330666
     health HEALTH_OK
            1 mons down, quorum 0,1 pod1-controller-0,pod1-controller-1
     monmap e1: 3 mons at {pod1-controller-0=11.118.0.10:6789/0,pod1-controller-1=11.118.0.11:6789/0,pod1-controller-2=11.118.0.12:6789/0}
            election epoch 28, quorum 0,1 pod1-controller-0,pod1-controller-1
     osdmap e709: 12 osds: 12 up, 12 in
            flags sortbitwise,require_jewel_osds
      pgmap v941813: 704 pgs, 6 pools, 490 GB data, 163 kobjects
            1470 GB used, 11922 GB / 13393 GB avail
                 704 active+clean
  client io 58580 B/s wr, 0 op/s rd, 7 op/s wr
 
[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph osd tree
ID WEIGHT   TYPE NAME                   UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 13.07996 root default                                                  
-2  4.35999     host pod1-osd-compute-0                                   
 0  1.09000         osd.0                    up  1.00000          1.00000 
 3  1.09000         osd.3                    up  1.00000          1.00000 
 6  1.09000         osd.6                    up  1.00000          1.00000 
 9  1.09000         osd.9                    up  1.00000          1.00000                                  
-4  4.35999     host pod1-osd-compute-2                                   
 2  1.09000         osd.2                    up  1.00000          1.00000 
 5  1.09000         osd.5                    up  1.00000          1.00000 
 8  1.09000         osd.8                    up  1.00000          1.00000 
11  1.09000         osd.11                   up  1.00000          1.00000 
-5  4.35999     host pod1-osd-compute-3                                   
 1  1.09000         osd.1                    up  1.00000          1.00000 
 4  1.09000         osd.4                    up  1.00000          1.00000 
 7  1.09000         osd.7                    up  1.00000          1.00000 
10  1.09000         osd.10                   up  1.00000          1.00000

Contributed by Cisco Engineers

Partheeban Rajagopal
Cisco Advanced Services
Padmaraj Ramanoudjam
Cisco Advanced Services

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Ultra Packet Core

Ultra-M Isolation and Replacement of Failed Disk from Ceph/Storage Cluster - vEPC

Available Languages

Bias-Free Language

Contents

Introduction

Background Information

Abbreviations

Workflow of the MoP

Prerequisite Health Checks

Isolation and Removal of Faulty OSD Disk from the Cluster

Replace OSD Disk and Create New VD

Add Back the OSD into the Cluster

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products