The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes the steps that are required to replace a faulty Object Storage Disk (OSD) - Compute server in an Ultra-M setup.
This procedure applies for an Openstack environment with NEWTON version where ESC does not manage CPAR and CPAR is installed directly on the Virtual Machine (VM) deployed on Openstack.
Ultra-M is a pre-packaged and validated virtualized mobile packet core solution that is designed in order to simplify the deployment of VNFs. OpenStack is the Virtual Infrastructure Manager (VIM) for Ultra-M and consists of these node types:
The high-level architecture of Ultra-M and the components involved are depicted in this image:
Note: Ultra M 5.1.x release is considered in order to define the procedures in this document.
MoP | MethodofProcedure |
OSD | Object Storage Disks |
OSPD | OpenStack Platform Director |
HDD | Hard Disk Drive |
SSD | Solid State Drive |
VIM | Virtual Infrastructure Manager |
VM | Virtual Machine |
EM | Element Manager |
UAS | Ultra Automation Services |
UUID | Universally Unique IDentifier |
Backup
Before you replace a Compute node, it is important to check the current state of your Red Hat OpenStack Platform environment. It is recommended that you check the current state in order to avoid complications when the Compute replacement process is on. It can be achieved by this flow of replacement.
In case of recovery, Cisco recommends to take a backup of the OSPD database with the use of these steps:
[root@director ~]# mysqldump --opt --all-databases > /root/undercloud-all-databases.sql [root@director ~]# tar --xattrs -czf undercloud-backup-`date +%F`.tar.gz /root/undercloud-all-databases.sql /etc/my.cnf.d/server.cnf /var/lib/glance/images /srv/node /home/stack tar: Removing leading `/' from member names
This process ensures that a node can be replaced without affecting the availability of any instances.
Note: Ensure that you have the snapshot of the instance so that you can restore the VM when needed. Follow the procedure on how to take snapshot of the VM.
[stack@director ~]$ nova list --field name,host | grep osd-compute-0 | 46b4b9eb-a1a6-425d-b886-a0ba760e6114 | AAA-CPAR-testing-instance | pod2-stack-compute-4.localdomain |
Note: In the output shown here, the first column corresponds to the Universally Unique IDentifier (UUID), the second column is the VM name and the third column is the hostname where the VM is present. The parameters from this output are used in subsequent sections.
Step 1. Open any Secure Shell (SSH) client connected to the network and connect to the CPAR instance.
It is important not to shutdown all 4 AAA instances within one site at the same time, do it in a one by one fashion.
Step 2. In order to shut down the CPAR application, run the command:
/opt/CSCOar/bin/arserver stop
A Message “Cisco Prime Access Registrar Server Agent shutdown complete.” must show up.
Note: If a user left a Command Line Interface (CLI) session open, the arserver stop command won’t work and this message is displayed.
ERROR: You cannot shut down Cisco Prime Access Registrar while the CLI is being used. Current list of running CLI with process id is: 2903 /opt/CSCOar/bin/aregcmd –s
In this example, the highlighted process id 2903 needs to be terminated before CPAR can be stopped. If this is the case, run the command in order to terminate this process:
kill -9 *process_id*
Then repeat Step 1.
Step 3. In order to verify that the CPAR application was indeed shutdown, run the command:
/opt/CSCOar/bin/arstatus
This messages must appear:
Cisco Prime Access Registrar Server Agent not running Cisco Prime Access Registrar GUI not running
Step 1. Enter the Horizon GUI website that corresponds to the Site (City) currently being worked on.
When you access Horizon, the screen observed is as shown in this image.
Step 2. Navigate to Project > Instances as shown in this image.
If the user used was CPAR, then only the 4 AAA instances appear in this menu.
Step 3. Shut down only one instance at a time and repeat the whole process in this document. In order to shutdown the VM, navigate to Actions > Shut Off Instance as shown in the image and confirm your selection.
Step 4. Validate that the instance was indeed shut down by checking the Status = Shutoff and Power State = Shut Down as shown in this image.
This step ends the CPAR shutdown process.
Once the CPAR VMs are down, the snapshots can be taken in parallel as they belong to independent computes.
The four QCOW2 files are created in parallel.
Take a snapshot of each AAA instance. (25 minutes -1 hour) (25 minutes for instances that used a qcow image as a source and 1 hour for instances that user a raw image as a source)
3. Click Create Snapshot in order to proceed with snapshot creation (this needs to be executed on the corresponding AAA instance) as shown in this image.
4. Once the snapshot is executed, click Images and verify that all finish and report no problems as shown in this image.
5. The next step is to download the snapshot on a QCOW2 format and transfer it to a remote entity, in case the OSPD is lost during this process. In order to achieve this, identify the snapshot by running the command glance image-list at OSPD level.
[root@elospd01 stack]# glance image-list +--------------------------------------+---------------------------+ | ID | Name | +--------------------------------------+---------------------------+ | 80f083cb-66f9-4fcf-8b8a-7d8965e47b1d | AAA-Temporary | | 22f8536b-3f3c-4bcc-ae1a-8f2ab0d8b950 | ELP1 cluman 10_09_2017 | | 70ef5911-208e-4cac-93e2-6fe9033db560 | ELP2 cluman 10_09_2017 | | e0b57fc9-e5c3-4b51-8b94-56cbccdf5401 | ESC-image | | 92dfe18c-df35-4aa9-8c52-9c663d3f839b | lgnaaa01-sept102017 | | 1461226b-4362-428b-bc90-0a98cbf33500 | tmobile-pcrf-13.1.1.iso | | 98275e15-37cf-4681-9bcc-d6ba18947d7b | tmobile-pcrf-13.1.1.qcow2 | +--------------------------------------+---------------------------+
6. Once you identify the snapshot to be downloaded (the one marked in green), you can download it on a QCOW2 format with the command glance image-download as depicted.
[root@elospd01 stack]# glance image-download 92dfe18c-df35-4aa9-8c52-9c663d3f839b --file /tmp/AAA-CPAR-LGNoct192017.qcow2 &
7. Once the download process finishes, a compression process needs to be executed as that snapshot might be filled with ZEROES because of processes, tasks and temporary files handled by the OS. The command to be used for file compression is virt-sparsify.
[root@elospd01 stack]# virt-sparsify AAA-CPAR-LGNoct192017.qcow2 AAA-CPAR-LGNoct192017_compressed.qcow2
This process can take some time (around 10-15 minutes). Once finished, the resulting file is the one that needs to be transferred to an external entity as specified on next step.
Verification of the file integrity is required, in order to achieve this, run the next command and look for the “corrupt” attribute at the end of its output.
[root@wsospd01 tmp]# qemu-img info AAA-CPAR-LGNoct192017_compressed.qcow2 image: AAA-CPAR-LGNoct192017_compressed.qcow2 file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 18G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
[stack@director ~]$ nova list --field name,host | grep osd-compute-0 | 46b4b9eb-a1a6-425d-b886-a0ba760e6114 | AAA-CPAR-testing-instance | pod2-stack-compute-4.localdomain |
Note: In the output shown here, the first column corresponds to the Universally Unique IDentifier (UUID), the second column is the VM name and the third column is the hostname where the VM is present. The parameters from this output are used in subsequent sections.
[heat-admin@pod2-stack-osd-compute-0 ~]$ sudo ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 13393G 11088G 2305G 17.21 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 0 0 3635G 0 metrics 1 3452M 0.09 3635G 219421 images 2 138G 3.67 3635G 43127 backups 3 0 0 3635G 0 volumes 4 139G 3.70 3635G 36581 vms 5 490G 11.89 3635G 126247
[heat-admin@pod2-stack-osd-compute-0 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 13.07996 root default -2 4.35999 host pod2-stack-osd-compute-0 0 1.09000 osd.0 up 1.00000 1.00000 3 1.09000 osd.3 up 1.00000 1.00000 6 1.09000 osd.6 up 1.00000 1.00000 9 1.09000 osd.9 up 1.00000 1.00000 -3 4.35999 host pod2-stack-osd-compute-1 1 1.09000 osd.1 up 1.00000 1.00000 4 1.09000 osd.4 up 1.00000 1.00000 7 1.09000 osd.7 up 1.00000 1.00000 10 1.09000 osd.10 up 1.00000 1.00000 -4 4.35999 host pod2-stack-osd-compute-2 2 1.09000 osd.2 up 1.00000 1.00000 5 1.09000 osd.5 up 1.00000 1.00000 8 1.09000 osd.8 up 1.00000 1.00000 11 1.09000 osd.11 up 1.00000 1.00000
[heat-admin@pod2-stack-osd-compute-0 ~]$ systemctl list-units *ceph* UNIT LOAD ACTIVE SUB DESCRIPTION var-lib-ceph-osd-ceph\x2d0.mount loaded active mounted /var/lib/ceph/osd/ceph-0 var-lib-ceph-osd-ceph\x2d3.mount loaded active mounted /var/lib/ceph/osd/ceph-3 var-lib-ceph-osd-ceph\x2d6.mount loaded active mounted /var/lib/ceph/osd/ceph-6 var-lib-ceph-osd-ceph\x2d9.mount loaded active mounted /var/lib/ceph/osd/ceph-9 ceph-osd@0.service loaded active running Ceph object storage daemon ceph-osd@3.service loaded active running Ceph object storage daemon ceph-osd@6.service loaded active running Ceph object storage daemon ceph-osd@9.service loaded active running Ceph object storage daemon system-ceph\x2ddisk.slice loaded active active system-ceph\x2ddisk.slice system-ceph\x2dosd.slice loaded active active system-ceph\x2dosd.slice ceph-mon.target loaded active active ceph target allowing to start/stop all ceph-mon@.service instances at once ceph-osd.target loaded active active ceph target allowing to start/stop all ceph-osd@.service instances at once ceph-radosgw.target loaded active active ceph target allowing to start/stop all ceph-radosgw@.service instances at once ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type.
14 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'.
[heat-admin@pod2-stack-osd-compute-0 ~]# systemctl disable ceph-osd@0 [heat-admin@pod2-stack-osd-compute-0 ~]# systemctl stop ceph-osd@0 [heat-admin@pod2-stack-osd-compute-0 ~]# ceph osd out 0
[heat-admin@pod2-stack-osd-compute-0 ~]# ceph osd crush remove osd.0
[heat-admin@pod2-stack-osd-compute-0 ~]# ceph auth del osd.0
[heat-admin@pod2-stack-osd-compute-0 ~]# ceph osd rm 0
[heat-admin@pod2-stack-osd-compute-0 ~]# umount /var/lib/ceph.osd/ceph-0 [heat-admin@pod2-stack-osd-compute-0 ~]# rm -rf /var/lib/ceph.osd/ceph-0
Or,
[heat-admin@pod2-stack-osd-compute-0 ~]$ sudo ls /var/lib/ceph/osd ceph-0 ceph-3 ceph-6 ceph-9
[heat-admin@pod2-stack-osd-compute-0 ~]$ /bin/sh clean.sh [heat-admin@pod2-stack-osd-compute-0 ~]$ cat clean.sh
#!/bin/sh set -x CEPH=`sudo ls /var/lib/ceph/osd` for c in $CEPH do i=`echo $c |cut -d'-' -f2` sudo systemctl disable ceph-osd@$i || (echo "error rc:$?"; exit 1) sleep 2 sudo systemctl stop ceph-osd@$i || (echo "error rc:$?"; exit 1) sleep 2 sudo ceph osd out $i || (echo "error rc:$?"; exit 1) sleep 2 sudo ceph osd crush remove osd.$i || (echo "error rc:$?"; exit 1) sleep 2 sudo ceph auth del osd.$i || (echo "error rc:$?"; exit 1) sleep 2 sudo ceph osd rm $i || (echo "error rc:$?"; exit 1) sleep 2 sudo umount /var/lib/ceph/osd/$c || (echo "error rc:$?"; exit 1) sleep 2 sudo rm -rf /var/lib/ceph/osd/$c || (echo "error rc:$?"; exit 1) sleep 2 done sudo ceph osd tree
After all the OSD processes have been migrated/deleted, the node can be removed from the overcloud.
Note: When CEPH is removed, VNF HD RAID goes into Degraded state but hd-disk must still be accessible.
Graceful Power Off
[stack@director ~]$ nova stop aaa2-21 Request to stop server aaa2-21 has been accepted. [stack@director ~]$ nova list +--------------------------------------+---------------------------+---------+------------+-------------+------------------------------------------------------------------------------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------------------------+---------+------------+-------------+------------------------------------------------------------------------------------------------------------+ | 46b4b9eb-a1a6-425d-b886-a0ba760e6114 | AAA-CPAR-testing-instance | ACTIVE | - | Running | tb1-mgmt=172.16.181.14, 10.225.247.233; radius-routable1=10.160.132.245; diameter-routable1=10.160.132.231 | | 3bc14173-876b-4d56-88e7-b890d67a4122 | aaa2-21 | SHUTOFF | - | Shutdown | diameter-routable1=10.160.132.230; radius-routable1=10.160.132.248; tb1-mgmt=172.16.181.7, 10.225.247.234 | | f404f6ad-34c8-4a5f-a757-14c8ed7fa30e | aaa21june | ACTIVE | - | Running | diameter-routable1=10.160.132.233; radius-routable1=10.160.132.244; tb1-mgmt=172.16.181.10 | +--------------------------------------+---------------------------+---------+------------+-------------+------------------------------------------------------------------------------------------------------------+
The steps mentioned in this section are common irrespective of the VMs hosted in the compute node.
Delete OSD-Compute Node from the Service List.
[stack@director ~]$ openstack compute service list |grep osd-compute | 135 | nova-compute | pod2-stack-osd-compute-1.localdomain | AZ-esc2 | enabled | up | 2018-06-22T11:05:22.000000 | | 150 | nova-compute | pod2-stack-osd-compute-2.localdomain | nova | enabled | up | 2018-06-22T11:05:17.000000 | | 153 | nova-compute | pod2-stack-osd-compute-0.localdomain | AZ-esc1 | enabled | up | 2018-06-22T11:05:25.000000 |
[stack@director ~]$ openstack compute service delete 150
Delete Neutron Agents
[stack@director ~]$ openstack network agent list | grep osd-compute-0 | eaecff95-b163-4cde-a99d-90bd26682b22 | Open vSwitch agent | pod2-stack-osd-compute-0.localdomain | None | True | UP | neutron-openvswitch-agent |
[stack@director ~]$ openstack network agent delete eaecff95-b163-4cde-a99d-90bd26682b22
Delete from Ironic Database
[root@director ~]# nova list | grep osd-compute-0 | 6810c884-1cb9-4321-9a07-192443920f1f | pod2-stack-osd-compute-0 | ACTIVE | - | Running | ctlplane=192.200.0.109 | [root@al03-pod2-ospd ~]$ nova delete 6810c884-1cb9-4321-9a07-192443920f1f
[root@director ~]# source stackrc [root@director ~]# nova show pod2-stack-osd-compute-0 | grep hypervisor | OS-EXT-SRV-ATTR:hypervisor_hostname | 05ceb513-e159-417d-a6d6-cbbcc4b167d7
[stack@director ~]$ ironic node-delete 05ceb513-e159-417d-a6d6-cbbcc4b167d7 [stack@director ~]$ ironic node-list
Node deleted must not be listed now in ironic node-list.
Delete from Overcloud
openstack overcloud node delete --templates -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml -e /home/stack/custom-templates/layout.yaml --stack <stack-name> <UUID>
[stack@director ~]$ source stackrc [stack@director ~]$ /bin/sh delete_node.sh + openstack overcloud node delete --templates -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml -e /home/stack/custom-templates/layout.yaml --stack pod2-stack 7439ea6c-3a88-47c2-9ff5-0a4f24647444 Deleting the following nodes from stack pod2-stack: - 7439ea6c-3a88-47c2-9ff5-0a4f24647444 Started Mistral Workflow. Execution ID: 4ab4508a-c1d5-4e48-9b95-ad9a5baa20ae real 0m52.078s user 0m0.383s sys 0m0.086s
[stack@director ~]$ openstack stack list +--------------------------------------+------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 5df68458-095d-43bd-a8c4-033e68ba79a0 | pod2-stack | UPDATE_COMPLETE | 2018-05-08T21:30:06Z | 2018-05-08T20:42:48Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+
Install New Compute Node
Cisco UCS C240 M4 Server Installation and Service Guide
Cisco UCS C-Series Rack-Mount Server BIOS Upgrade Guide
Navigate to Storage > Cisco 12G SAS Modular Raid Controller (SLOT-HBA) > Physical Drive Info as shown in this image.
Navigate to Storage > Cisco 12G SAS Modular Raid Controller (SLOT-HBA) > Controller Info > Create Virtual Drive from Unused Physical Drives as shown in this image.
Navigate to Admin > Communication Services > Communication Services as shown in the image.
Navigate to Compute > BIOS > Configure BIOS > Advanced > Processor Configuration as shown in the image.
JOURNAL > From physical drive number 3 OSD1 > From physical drive number 7 OSD2 > From physical drive number 8 OSD3 > From physical drive number 9 OSD4 > From physical drive number 10
Note: The image shown here and the configuration steps mentioned in this section are with reference to the firmware version 3.0(3e) and there might be slight variations if you work on other versions.
Add New OSD-Compute Node to Overcloud
The steps mentioned in this section are common irrespective of the VM hosted by the compute node.
Create an add_node.json file with only the details of the new compute server to be added. Ensure that the index number for the new compute server has not been used before. Typically, increment the next highest compute value.
Example: Highest prior was osd-compute-17, therefore, created osd-compute-18 in case of 2-vnf system.
Note: Be mindful of the json format.
[stack@director ~]$ cat add_node.json { "nodes":[ { "mac":[ "<MAC_ADDRESS>" ], "capabilities": "node:osd-compute-3,boot_option:local", "cpu":"24", "memory":"256000", "disk":"3000", "arch":"x86_64", "pm_type":"pxe_ipmitool", "pm_user":"admin", "pm_password":"<PASSWORD>", "pm_addr":"192.100.0.5" } ] }
[stack@director ~]$ openstack baremetal import --json add_node.json Started Mistral Workflow. Execution ID: 78f3b22c-5c11-4d08-a00f-8553b09f497d Successfully registered node UUID 7eddfa87-6ae6-4308-b1d2-78c98689a56e Started Mistral Workflow. Execution ID: 33a68c16-c6fd-4f2a-9df9-926545f2127e Successfully set all nodes to available.
[stack@director ~]$ openstack baremetal node manage 7eddfa87-6ae6-4308-b1d2-78c98689a56e [stack@director ~]$ ironic node-list |grep 7eddfa87 | 7eddfa87-6ae6-4308-b1d2-78c98689a56e | None | None | power off | manageable | False | [stack@director ~]$ openstack overcloud node introspect 7eddfa87-6ae6-4308-b1d2-78c98689a56e --provide Started Mistral Workflow. Execution ID: e320298a-6562-42e3-8ba6-5ce6d8524e5c Waiting for introspection to finish... Successfully introspected all nodes. Introspection completed. Started Mistral Workflow. Execution ID: c4a90d7b-ebf2-4fcb-96bf-e3168aa69dc9 Successfully set all nodes to available. [stack@director ~]$ ironic node-list |grep available | 7eddfa87-6ae6-4308-b1d2-78c98689a56e | None | None | power off | available | False |
OsdComputeIPs:
internal_api: - 11.120.0.43 - 11.120.0.44 - 11.120.0.45 - 11.120.0.43 <<< take osd-compute-0 .43 and add here tenant: - 11.117.0.43 - 11.117.0.44 - 11.117.0.45 - 11.117.0.43 << and here storage: - 11.118.0.43 - 11.118.0.44 - 11.118.0.45 - 11.118.0.43 << and here storage_mgmt: - 11.119.0.43 - 11.119.0.44 - 11.119.0.45 - 11.119.0.43 << and here
[stack@director ~]$ ./deploy.sh ++ openstack overcloud deploy --templates -r /home/stack/custom-templates/custom-roles.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml --stack ADN-ultram --debug --log-file overcloudDeploy_11_06_17__16_39_26.log --ntp-server 172.24.167.109 --neutron-flat-networks phys_pcie1_0,phys_pcie1_1,phys_pcie4_0,phys_pcie4_1 --neutron-network-vlan-ranges datacentre:1001:1050 --neutron-disable-tunneling --verbose --timeout 180 … Starting new HTTP connection (1): 192.200.0.1 "POST /v2/action_executions HTTP/1.1" 201 1695 HTTP POST http://192.200.0.1:8989/v2/action_executions 201 Overcloud Endpoint: http://10.1.2.5:5000/v2.0 Overcloud Deployed clean_up DeployOvercloud: END return value: 0 real 38m38.971s user 0m3.605s sys 0m0.466s
[stack@director ~]$ openstack stack list +--------------------------------------+------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 5df68458-095d-43bd-a8c4-033e68ba79a0 | ADN-ultram | UPDATE_COMPLETE | 2017-11-02T21:30:06Z | 2017-11-06T21:40:58Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+
[stack@director ~]$ source stackrc [stack@director ~]$ nova list |grep osd-compute-3 | 0f2d88cd-d2b9-4f28-b2ca-13e305ad49ea | pod1-osd-compute-3 | ACTIVE | - | Running | ctlplane=192.200.0.117 | [stack@director ~]$ source corerc [stack@director ~]$ openstack hypervisor list |grep osd-compute-3 | 63 | pod1-osd-compute-3.localdomain |
[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph -s cluster eb2bb192-b1c9-11e6-9205-525400330666 health HEALTH_WARN 223 pgs backfill_wait 4 pgs backfilling 41 pgs degraded 227 pgs stuck unclean 41 pgs undersized recovery 45229/1300136 objects degraded (3.479%) recovery 525016/1300136 objects misplaced (40.382%) monmap e1: 3 mons at {Pod1-controller-0=11.118.0.40:6789/0,Pod1-controller-1=11.118.0.41:6789/0,Pod1-controller-2=11.118.0.42:6789/0} election epoch 58, quorum 0,1,2 Pod1-controller-0,Pod1-controller-1,Pod1-controller-2 osdmap e986: 12 osds: 12 up, 12 in; 225 remapped pgs flags sortbitwise,require_jewel_osds pgmap v781746: 704 pgs, 6 pools, 533 GB data, 344 kobjects 1553 GB used, 11840 GB / 13393 GB avail 45229/1300136 objects degraded (3.479%) 525016/1300136 objects misplaced (40.382%) 477 active+clean 186 active+remapped+wait_backfill 37 active+undersized+degraded+remapped+wait_backfill 4 active+undersized+degraded+remapped+backfilling
[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph -s
cluster eb2bb192-b1c9-11e6-9205-525400330666 health HEALTH_OK monmap e1: 3 mons at {Pod1-controller-0=11.118.0.40:6789/0,Pod1-controller-1=11.118.0.41:6789/0,Pod1-controller-2=11.118.0.42:6789/0} election epoch 58, quorum 0,1,2 Pod1-controller-0,Pod1-controller-1,Pod1-controller-2 osdmap e1398: 12 osds: 12 up, 12 in flags sortbitwise,require_jewel_osds pgmap v784311: 704 pgs, 6 pools, 533 GB data, 344 kobjects 1599 GB used, 11793 GB / 13393 GB avail 704 active+clean client io 8168 kB/s wr, 0 op/s rd, 32 op/s wr [heat-admin@pod1-osd-compute-3 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 13.07996 root default -2 0 host pod1-osd-compute-0 -3 4.35999 host pod1-osd-compute-2 1 1.09000 osd.1 up 1.00000 1.00000 4 1.09000 osd.4 up 1.00000 1.00000 7 1.09000 osd.7 up 1.00000 1.00000 10 1.09000 osd.10 up 1.00000 1.00000 -4 4.35999 host pod1-osd-compute-1 2 1.09000 osd.2 up 1.00000 1.00000 5 1.09000 osd.5 up 1.00000 1.00000 8 1.09000 osd.8 up 1.00000 1.00000 11 1.09000 osd.11 up 1.00000 1.00000 -5 4.35999 host pod1-osd-compute-3 0 1.09000 osd.0 up 1.00000 1.00000 3 1.09000 osd.3 up 1.00000 1.00000 6 1.09000 osd.6 up 1.00000 1.00000 9 1.09000 osd.9 up 1.00000 1.00000
It is possible to redeploy the previous instance with the snapshot taken in previous steps.
Step 1. (Optional) If there is no previous VM snapshot available then connect to the OSPD node where the backup was sent and SFTP the backup back to its original OSPD node. Using sftp root@x.x.x.xwhere x.x.x.x is the IP of a the original OSPD. Save the snapshot file in /tmp directory.
Step 2. Connect to the OSPD node where the instance is redeployed.
Source the environment variables with this command:
# source /home/stack/pod1-stackrc-Core-CPAR
Step 3. In order to use the snapshot as an image it is necessary to upload it to horizon as such. Run the next command to do so.
#glance image-create -- AAA-CPAR-Date-snapshot.qcow2 --container-format bare --disk-format qcow2 --name AAA-CPAR-Date-snapshot
The process can be seen in horizon as shown in this image.
Step 4. In Horizon, navigate to Project > Instances and click Lauch Instance as shown in this image.
Step 5. Enter the Instance Name and choose the Availability Zone as shown in this image.
Step 6. In the Source tab, choose the image in order to create the instance. In the Select Boot Source menu, select image, a list of images are shown, choose the one that was previously uploaded by clickin on its + sign and as shown in this image.
Step 7. In the Flavor tab, choose the AAA flavor by clicking on the + sign as shown in this image.
Step 8. Finally, navigate to the Network tab and choose the networks that the instance needs by clicking on the + sign. For this case, select diameter-soutable1, radius-routable1 and tb1-mgmt as shown in this image.
Step 9. Finally, click on Launch Instance to create it. The progress can be monitored in Horizon is as shown in this image.
After a few minutes the instance will be completely deployed and ready for use.
A floating IP address is a routable address, which means that it’s reachable from the outside of Ultra M/Openstack architecture, and it’s able to communicate with other nodes from the network.
Step 1. In the Horizon top menu, navigate to Admin > Floating IPs.
Step 2. Click Allocate IP to Project.
Step 3. In the Allocate Floating IP window, select the Pool from which the new floating IP belongs, the Project where it is going to be assigned, and the new Floating IP Address itself.
For example:
Step 4. Click Allocate Floating IP.
Step 5. In the Horizon top menu, navigate to Project > Instances.
Step 6. In the Action column, click on the arrow that points down in the Create Snapshot button, a menu must be displayed. Select Associate Floating IP option.
Step 7. Select the corresponding floating IP address intended to be used in the IP Address field, and choose the corresponding management interface (eth0) from the new instance where this floating IP is going to be assigned in the Port to be associated. Refer the next image as an example of this procedure.
Step 8. Finally, click Associate.
Step 1. In the Horizon top menu, navigate to Project > Instances.
Step 2. Click on the name of the instance/VM that was created in section Lunch a New Instance.
Step 3. Click Console. This will display the CLI of the VM.
Step 4. Once the CLI is displayed, enter the proper login credentials:
Username: root
Password: cisco123 as shown in this image.
Step 5. In the CLI, run the command vi /etc/ssh/sshd_config in order to edit the ssh configuration.
Step 6. Once the SSH configuration file is open, press I in orderto edit the file. Then look for the section shown here and change the first line from PasswordAuthentication no to PasswordAuthentication yes.
Step 7. Press ESC and enter :wq! to save sshd_config file changes.
Step 8. Run the command service sshd restart.
Step 9. In order to test SSH configuration changes have been correctly applied, open any SSH client and try to stablish a remote secure connection using the floating IP assigned to the instance (i.e. 10.145.0.249) and the user root.
Step 1. Open a SSH session with the IP address of the corresponding VM/server where the application is installed as shown in this image.
Follow these steps, once the activity has been completed and CPAR services can be re-established in the Site that was shut down.
Step 1. Login back to Horizon, navigate to Project > Instance > Start Instance.
Step 2. Verify that the status of the instance is Active and the power state is Running as shown in this image.
Step 1. Run the command /opt/CSCOar/bin/arstatus at OS level:
[root@wscaaa04 ~]# /opt/CSCOar/bin/arstatus Cisco Prime AR RADIUS server running (pid: 24834) Cisco Prime AR Server Agent running (pid: 24821) Cisco Prime AR MCD lock manager running (pid: 24824) Cisco Prime AR MCD server running (pid: 24833) Cisco Prime AR GUI running (pid: 24836) SNMP Master Agent running (pid: 24835) [root@wscaaa04 ~]#
Step 2. Run the command /opt/CSCOar/bin/aregcmd at OS level and enter the admin credentials. Verify that CPAr Health is 10 out of 10 and the exit CPAR CLI.
[root@aaa02 logs]# /opt/CSCOar/bin/aregcmd Cisco Prime Access Registrar 7.3.0.1 Configuration Utility Copyright (C) 1995-2017 by Cisco Systems, Inc. All rights reserved. Cluster: User: admin Passphrase: Logging in to localhost [ //localhost ] LicenseInfo = PAR-NG-TPS 7.2(100TPS:) PAR-ADD-TPS 7.2(2000TPS:) PAR-RDDR-TRX 7.2() PAR-HSS 7.2() Radius/ Administrators/ Server 'Radius' is Running, its health is 10 out of 10 --> exit
Step 3. Run the command netstat | grep diameter and verify that all DRA connections are established.
The output mentioned here is for an environment where Diameter links are expected. If fewer links are displayed, this represents a disconnection from the DRA that needs to be analyzed.
[root@aa02 logs]# netstat | grep diameter tcp 0 0 aaa02.aaa.epc.:77 mp1.dra01.d:diameter ESTABLISHED tcp 0 0 aaa02.aaa.epc.:36 tsa6.dra01:diameter ESTABLISHED tcp 0 0 aaa02.aaa.epc.:47 mp2.dra01.d:diameter ESTABLISHED tcp 0 0 aaa02.aaa.epc.:07 tsa5.dra01:diameter ESTABLISHED tcp 0 0 aaa02.aaa.epc.:08 np2.dra01.d:diameter ESTABLISHED
Step 4. Check that the TPS log shows requests being processed by CPAR. The values highlighted represent the TPS and those are the ones you need to pay attention to.
The value of TPS must not exceed 1500.
[root@wscaaa04 ~]# tail -f /opt/CSCOar/logs/tps-11-21-2017.csv 11-21-2017,23:57:35,263,0 11-21-2017,23:57:50,237,0 11-21-2017,23:58:05,237,0 11-21-2017,23:58:20,257,0 11-21-2017,23:58:35,254,0 11-21-2017,23:58:50,248,0 11-21-2017,23:59:05,272,0 11-21-2017,23:59:20,243,0 11-21-2017,23:59:35,244,0 11-21-2017,23:59:50,233,0
Step 5. Look for any “error” or “alarm” messages in name_radius_1_log.
[root@aaa02 logs]# grep -E "error|alarm" name_radius_1_log
Step 6. In order to verify the amount of memory that the CPAR process used, run the command:
top | grep radius
[root@sfraaa02 ~]# top | grep radius 27008 root 20 0 20.228g 2.413g 11408 S 128.3 7.7 1165:41 radius
This highlighted value must be lower than 7Gb, which is the maximum allowed at application level.