此产品的文档集力求使用非歧视性语言。在本文档集中,非歧视性语言是指不隐含针对年龄、残障、性别、种族身份、族群身份、性取向、社会经济地位和交叉性的歧视的语言。由于产品软件的用户界面中使用的硬编码语言、基于 RFP 文档使用的语言或引用的第三方产品使用的语言,文档中可能无法确保完全使用非歧视性语言。 深入了解思科如何使用包容性语言。
思科采用人工翻译与机器翻译相结合的方式将此文档翻译成不同语言,希望全球的用户都能通过各自的语言得到支持性的内容。 请注意:即使是最好的机器翻译,其准确度也不及专业翻译人员的水平。 Cisco Systems, Inc. 对于翻译的准确性不承担任何责任,并建议您总是参考英文原始文档(已提供链接)。
本文档介绍在Ultra-M设置中更换有故障的对象存储磁盘(OSD) — 计算服务器所需的步骤。
此过程适用于NEWTON版本的OpenStack环境,其中ESC不管理CPAR,CPAR直接安装在部署在OpenStack上的虚拟机(VM)上。
Ultra-M是经过预封装和验证的虚拟化移动数据包核心解决方案,旨在简化VNF的部署。OpenStack是Ultra-M的虚拟基础设施管理器(VIM),由以下节点类型组成:
此图中描述了Ultra-M的高级体系结构和涉及的组件:
注意:为了定义本文档中的步骤,我们考虑了Ultra M 5.1.x版本。
MoP | 方法 |
OSD | 对象存储磁盘 |
OSPD | OpenStack平台导向器 |
硬盘 | 硬盘驱动器 |
SSD | 固态驱动器 |
VIM | 虚拟基础设施管理器 |
虚拟机 | 虚拟机 |
EM | 元素管理器 |
UAS | 超自动化服务 |
UUID | 通用唯一IDentifier |
备份
在替换计算节点之前,必须检查Red Hat OpenStack平台环境的当前状态。建议您检查当前状态,以避免在计算更换流程启动时出现问题。通过这种替换流可以实现。
在恢复时,思科建议使用以下步骤备份OSPD数据库:
[root@director ~]# mysqldump --opt --all-databases > /root/undercloud-all-databases.sql [root@director ~]# tar --xattrs -czf undercloud-backup-`date +%F`.tar.gz /root/undercloud-all-databases.sql /etc/my.cnf.d/server.cnf /var/lib/glance/images /srv/node /home/stack tar: Removing leading `/' from member names
此过程可确保在不影响任何实例可用性的情况下更换节点。
注意:确保您拥有实例的快照,以便在需要时恢复虚拟机。按照有关如何拍摄VM快照的步骤操作。
[stack@director ~]$ nova list --field name,host | grep osd-compute-0 | 46b4b9eb-a1a6-425d-b886-a0ba760e6114 | AAA-CPAR-testing-instance | pod2-stack-compute-4.localdomain |
注意:在此处显示的输出中,第一列对应于通用唯一IDentifier(UUID),第二列是VM名称,第三列是VM所在的主机名。此输出的参数将用于后续部分。
步骤1.打开连接到网络并连接到CPAR实例的任何Secure Shell(SSH)客户端。
切勿同时关闭一个站点内的所有4个AAA实例,以逐个方式执行。
步骤2.要关闭CPAR应用,请运行以下命令:
/opt/CSCOar/bin/arserver stop
消息“Cisco Prime Access Registrar Server Agent Shutdown complete”。 必须出现。
注意:如果用户使命令行界面(CLI)会话处于打开状态,则arserver stop命令将不起作用,并且显示此消息。
ERROR: You cannot shut down Cisco Prime Access Registrar while the CLI is being used. Current list of running CLI with process id is: 2903 /opt/CSCOar/bin/aregcmd –s
在本例中,需要终止突出显示的进程ID 2903,然后才能停止CPAR。如果出现这种情况,请运行命令以终止此过程:
kill -9 *process_id*
然后重复步骤1。
步骤3.要验证CPAR应用确实已关闭,请运行以下命令:
/opt/CSCOar/bin/arstatus
必须显示以下消息:
Cisco Prime Access Registrar Server Agent not running Cisco Prime Access Registrar GUI not running
步骤1.输入与当前正在处理的站点(城市)对应的Horizon GUI网站。
访问Horizon时,观察到的屏幕如下图所示。
步骤2.导航至“项目”>“实例”,如下图所示。
如果使用的用户是CPAR,则此菜单中仅显示4个AAA实例。
步骤3.一次只关闭一个实例,并重复本文档中的整个过程。要关闭VM,请导航至操作>关闭实例(如图所示)并确认您的选择。
步骤4.通过选中Status = Shutoff和Power State = Shut Down(关闭),验证实例是否确实已关闭,如下图所示。
此步骤将结束CPAR关闭过程。
一旦CPAR VM关闭,快照可以并行拍摄,因为它们属于独立计算机。
四个QCOW2文件并行创建。
拍摄每个AAA实例的快照。(25分钟–1小时)(使用qcow映像作为源的实例为25分钟,使用原始映像作为源的实例为1小时)
3.单击创建快照以继续创建快照(需要在相应的AAA实例上执行此操作),如此图所示。
4.执行快照后,单击Images并验证所有快照都已完成,且未报告任何问题,如此图所示。
5.下一步是下载QCOW2格式的快照并将其传输到远程实体,以防OSPD在此过程中丢失。为此,请在OSPD级别运行命令glance image-list来识别快照。
[root@elospd01 stack]# glance image-list +--------------------------------------+---------------------------+ | ID | Name | +--------------------------------------+---------------------------+ | 80f083cb-66f9-4fcf-8b8a-7d8965e47b1d | AAA-Temporary | | 22f8536b-3f3c-4bcc-ae1a-8f2ab0d8b950 | ELP1 cluman 10_09_2017 | | 70ef5911-208e-4cac-93e2-6fe9033db560 | ELP2 cluman 10_09_2017 | | e0b57fc9-e5c3-4b51-8b94-56cbccdf5401 | ESC-image | | 92dfe18c-df35-4aa9-8c52-9c663d3f839b | lgnaaa01-sept102017 | | 1461226b-4362-428b-bc90-0a98cbf33500 | tmobile-pcrf-13.1.1.iso | | 98275e15-37cf-4681-9bcc-d6ba18947d7b | tmobile-pcrf-13.1.1.qcow2 | +--------------------------------------+---------------------------+
6.确定要下载的快照(标有绿色的快照)后,可以使用QCOW2格式下载快照,命令glance image-download如图所示。
[root@elospd01 stack]# glance image-download 92dfe18c-df35-4aa9-8c52-9c663d3f839b --file /tmp/AAA-CPAR-LGNoct192017.qcow2 &
7.下载过程完成后,需要执行压缩过程,因为由于操作系统处理的进程、任务和临时文件,快照可能会用ZEROES填充。用于文件压缩的命令是virt-sparsify。
[root@elospd01 stack]# virt-sparsify AAA-CPAR-LGNoct192017.qcow2 AAA-CPAR-LGNoct192017_compressed.qcow2
此过程可能需要一些时间(大约10-15分钟)。 完成后,生成的文件是需要按照下一步指定的方式传输到外部实体的文件。
需要验证文件完整性,为此,请运行下一个命令并在其输出末尾查找“损坏”属性。
[root@wsospd01 tmp]# qemu-img info AAA-CPAR-LGNoct192017_compressed.qcow2 image: AAA-CPAR-LGNoct192017_compressed.qcow2 file format: qcow2 virtual size: 150G (161061273600 bytes) disk size: 18G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
[stack@director ~]$ nova list --field name,host | grep osd-compute-0 | 46b4b9eb-a1a6-425d-b886-a0ba760e6114 | AAA-CPAR-testing-instance | pod2-stack-compute-4.localdomain |
注意:在此处显示的输出中,第一列对应于通用唯一IDentifier(UUID),第二列是VM名称,第三列是VM所在的主机名。此输出的参数将用于后续部分。
[heat-admin@pod2-stack-osd-compute-0 ~]$ sudo ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 13393G 11088G 2305G 17.21 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 0 0 3635G 0 metrics 1 3452M 0.09 3635G 219421 images 2 138G 3.67 3635G 43127 backups 3 0 0 3635G 0 volumes 4 139G 3.70 3635G 36581 vms 5 490G 11.89 3635G 126247
[heat-admin@pod2-stack-osd-compute-0 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 13.07996 root default -2 4.35999 host pod2-stack-osd-compute-0 0 1.09000 osd.0 up 1.00000 1.00000 3 1.09000 osd.3 up 1.00000 1.00000 6 1.09000 osd.6 up 1.00000 1.00000 9 1.09000 osd.9 up 1.00000 1.00000 -3 4.35999 host pod2-stack-osd-compute-1 1 1.09000 osd.1 up 1.00000 1.00000 4 1.09000 osd.4 up 1.00000 1.00000 7 1.09000 osd.7 up 1.00000 1.00000 10 1.09000 osd.10 up 1.00000 1.00000 -4 4.35999 host pod2-stack-osd-compute-2 2 1.09000 osd.2 up 1.00000 1.00000 5 1.09000 osd.5 up 1.00000 1.00000 8 1.09000 osd.8 up 1.00000 1.00000 11 1.09000 osd.11 up 1.00000 1.00000
[heat-admin@pod2-stack-osd-compute-0 ~]$ systemctl list-units *ceph* UNIT LOAD ACTIVE SUB DESCRIPTION var-lib-ceph-osd-ceph\x2d0.mount loaded active mounted /var/lib/ceph/osd/ceph-0 var-lib-ceph-osd-ceph\x2d3.mount loaded active mounted /var/lib/ceph/osd/ceph-3 var-lib-ceph-osd-ceph\x2d6.mount loaded active mounted /var/lib/ceph/osd/ceph-6 var-lib-ceph-osd-ceph\x2d9.mount loaded active mounted /var/lib/ceph/osd/ceph-9 ceph-osd@0.service loaded active running Ceph object storage daemon ceph-osd@3.service loaded active running Ceph object storage daemon ceph-osd@6.service loaded active running Ceph object storage daemon ceph-osd@9.service loaded active running Ceph object storage daemon system-ceph\x2ddisk.slice loaded active active system-ceph\x2ddisk.slice system-ceph\x2dosd.slice loaded active active system-ceph\x2dosd.slice ceph-mon.target loaded active active ceph target allowing to start/stop all ceph-mon@.service instances at once ceph-osd.target loaded active active ceph target allowing to start/stop all ceph-osd@.service instances at once ceph-radosgw.target loaded active active ceph target allowing to start/stop all ceph-radosgw@.service instances at once ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type.
14 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'.
[heat-admin@pod2-stack-osd-compute-0 ~]# systemctl disable ceph-osd@0 [heat-admin@pod2-stack-osd-compute-0 ~]# systemctl stop ceph-osd@0 [heat-admin@pod2-stack-osd-compute-0 ~]# ceph osd out 0
[heat-admin@pod2-stack-osd-compute-0 ~]# ceph osd crush remove osd.0
[heat-admin@pod2-stack-osd-compute-0 ~]# ceph auth del osd.0
[heat-admin@pod2-stack-osd-compute-0 ~]# ceph osd rm 0
[heat-admin@pod2-stack-osd-compute-0 ~]# umount /var/lib/ceph.osd/ceph-0 [heat-admin@pod2-stack-osd-compute-0 ~]# rm -rf /var/lib/ceph.osd/ceph-0
或者,
[heat-admin@pod2-stack-osd-compute-0 ~]$ sudo ls /var/lib/ceph/osd ceph-0 ceph-3 ceph-6 ceph-9
[heat-admin@pod2-stack-osd-compute-0 ~]$ /bin/sh clean.sh [heat-admin@pod2-stack-osd-compute-0 ~]$ cat clean.sh
#!/bin/sh set -x CEPH=`sudo ls /var/lib/ceph/osd` for c in $CEPH do i=`echo $c |cut -d'-' -f2` sudo systemctl disable ceph-osd@$i || (echo "error rc:$?"; exit 1) sleep 2 sudo systemctl stop ceph-osd@$i || (echo "error rc:$?"; exit 1) sleep 2 sudo ceph osd out $i || (echo "error rc:$?"; exit 1) sleep 2 sudo ceph osd crush remove osd.$i || (echo "error rc:$?"; exit 1) sleep 2 sudo ceph auth del osd.$i || (echo "error rc:$?"; exit 1) sleep 2 sudo ceph osd rm $i || (echo "error rc:$?"; exit 1) sleep 2 sudo umount /var/lib/ceph/osd/$c || (echo "error rc:$?"; exit 1) sleep 2 sudo rm -rf /var/lib/ceph/osd/$c || (echo "error rc:$?"; exit 1) sleep 2 done sudo ceph osd tree
迁移/删除所有OSD进程后,可以从超云中删除节点。
注意:删除CEPH后,VNF HD RAID将进入降级状态,但必须仍可访问硬盘。
平稳关闭电源
[stack@director ~]$ nova stop aaa2-21 Request to stop server aaa2-21 has been accepted. [stack@director ~]$ nova list +--------------------------------------+---------------------------+---------+------------+-------------+------------------------------------------------------------------------------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------------------------+---------+------------+-------------+------------------------------------------------------------------------------------------------------------+ | 46b4b9eb-a1a6-425d-b886-a0ba760e6114 | AAA-CPAR-testing-instance | ACTIVE | - | Running | tb1-mgmt=172.16.181.14, 10.225.247.233; radius-routable1=10.160.132.245; diameter-routable1=10.160.132.231 | | 3bc14173-876b-4d56-88e7-b890d67a4122 | aaa2-21 | SHUTOFF | - | Shutdown | diameter-routable1=10.160.132.230; radius-routable1=10.160.132.248; tb1-mgmt=172.16.181.7, 10.225.247.234 | | f404f6ad-34c8-4a5f-a757-14c8ed7fa30e | aaa21june | ACTIVE | - | Running | diameter-routable1=10.160.132.233; radius-routable1=10.160.132.244; tb1-mgmt=172.16.181.10 | +--------------------------------------+---------------------------+---------+------------+-------------+------------------------------------------------------------------------------------------------------------+
本节中提及的步骤是通用的,与计算节点中托管的VM无关。
从“Service List(服务列表)”中删除OSD-Compute节点。
[stack@director ~]$ openstack compute service list |grep osd-compute | 135 | nova-compute | pod2-stack-osd-compute-1.localdomain | AZ-esc2 | enabled | up | 2018-06-22T11:05:22.000000 | | 150 | nova-compute | pod2-stack-osd-compute-2.localdomain | nova | enabled | up | 2018-06-22T11:05:17.000000 | | 153 | nova-compute | pod2-stack-osd-compute-0.localdomain | AZ-esc1 | enabled | up | 2018-06-22T11:05:25.000000 |
[stack@director ~]$ openstack compute service delete 150
删除中子代理
[stack@director ~]$ openstack network agent list | grep osd-compute-0 | eaecff95-b163-4cde-a99d-90bd26682b22 | Open vSwitch agent | pod2-stack-osd-compute-0.localdomain | None | True | UP | neutron-openvswitch-agent |
[stack@director ~]$ openstack network agent delete eaecff95-b163-4cde-a99d-90bd26682b22
从Ironic数据库中删除
[root@director ~]# nova list | grep osd-compute-0 | 6810c884-1cb9-4321-9a07-192443920f1f | pod2-stack-osd-compute-0 | ACTIVE | - | Running | ctlplane=192.200.0.109 | [root@al03-pod2-ospd ~]$ nova delete 6810c884-1cb9-4321-9a07-192443920f1f
[root@director ~]# source stackrc [root@director ~]# nova show pod2-stack-osd-compute-0 | grep hypervisor | OS-EXT-SRV-ATTR:hypervisor_hostname | 05ceb513-e159-417d-a6d6-cbbcc4b167d7
[stack@director ~]$ ironic node-delete 05ceb513-e159-417d-a6d6-cbbcc4b167d7 [stack@director ~]$ ironic node-list
现在,删除的节点不能列在具有讽刺意味的节点列表中。
从Overcloud中删除
openstack overcloud node delete --templates -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml -e /home/stack/custom-templates/layout.yaml --stack <stack-name> <UUID>
[stack@director ~]$ source stackrc [stack@director ~]$ /bin/sh delete_node.sh + openstack overcloud node delete --templates -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml -e /home/stack/custom-templates/layout.yaml --stack pod2-stack 7439ea6c-3a88-47c2-9ff5-0a4f24647444 Deleting the following nodes from stack pod2-stack: - 7439ea6c-3a88-47c2-9ff5-0a4f24647444 Started Mistral Workflow. Execution ID: 4ab4508a-c1d5-4e48-9b95-ad9a5baa20ae real 0m52.078s user 0m0.383s sys 0m0.086s
[stack@director ~]$ openstack stack list +--------------------------------------+------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 5df68458-095d-43bd-a8c4-033e68ba79a0 | pod2-stack | UPDATE_COMPLETE | 2018-05-08T21:30:06Z | 2018-05-08T20:42:48Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+
安装新计算节点
导航至存储> Cisco 12G SAS模块化RAID控制器(SLOT-HBA)>物理驱动器信息,如下图所示。
导航至存储> Cisco 12G SAS模块化RAID控制器(SLOT-HBA)>控制器信息>从未使用的物理驱动器创建虚拟驱动器,如此图所示。
导航至Admin > Communication Services > Communication Services,如图所示。
导航至计算> BIOS >配置BIOS >高级>处理器配置,如图所示。
JOURNAL > From physical drive number 3 OSD1 > From physical drive number 7 OSD2 > From physical drive number 8 OSD3 > From physical drive number 9 OSD4 > From physical drive number 10
注意:此处显示的图像和本节中提及的配置步骤均参考固件版本3.0(3e),如果您使用其他版本,可能会略有变化。
将新的OSD计算节点添加到Overcloud
本节中提及的步骤是通用的,与计算节点托管的VM无关。
创建仅包含要添加的新计算服务器详细信息的add_node.json文件。确保以前未使用过新计算服务器的索引号。通常,增加下一个最高的计算值。
示例:最早的是osd-compute-17,因此,在2-vnf系统的情况下,创建了osd-compute-18。
注意:注意json格式。
[stack@director ~]$ cat add_node.json { "nodes":[ { "mac":[ "<MAC_ADDRESS>" ], "capabilities": "node:osd-compute-3,boot_option:local", "cpu":"24", "memory":"256000", "disk":"3000", "arch":"x86_64", "pm_type":"pxe_ipmitool", "pm_user":"admin", "pm_password":"<PASSWORD>", "pm_addr":"192.100.0.5" } ] }
[stack@director ~]$ openstack baremetal import --json add_node.json Started Mistral Workflow. Execution ID: 78f3b22c-5c11-4d08-a00f-8553b09f497d Successfully registered node UUID 7eddfa87-6ae6-4308-b1d2-78c98689a56e Started Mistral Workflow. Execution ID: 33a68c16-c6fd-4f2a-9df9-926545f2127e Successfully set all nodes to available.
[stack@director ~]$ openstack baremetal node manage 7eddfa87-6ae6-4308-b1d2-78c98689a56e [stack@director ~]$ ironic node-list |grep 7eddfa87 | 7eddfa87-6ae6-4308-b1d2-78c98689a56e | None | None | power off | manageable | False | [stack@director ~]$ openstack overcloud node introspect 7eddfa87-6ae6-4308-b1d2-78c98689a56e --provide Started Mistral Workflow. Execution ID: e320298a-6562-42e3-8ba6-5ce6d8524e5c Waiting for introspection to finish... Successfully introspected all nodes. Introspection completed. Started Mistral Workflow. Execution ID: c4a90d7b-ebf2-4fcb-96bf-e3168aa69dc9 Successfully set all nodes to available. [stack@director ~]$ ironic node-list |grep available | 7eddfa87-6ae6-4308-b1d2-78c98689a56e | None | None | power off | available | False |
OsdComputeIP:
internal_api: - 11.120.0.43 - 11.120.0.44 - 11.120.0.45 - 11.120.0.43 <<< take osd-compute-0 .43 and add here tenant: - 11.117.0.43 - 11.117.0.44 - 11.117.0.45 - 11.117.0.43 << and here storage: - 11.118.0.43 - 11.118.0.44 - 11.118.0.45 - 11.118.0.43 << and here storage_mgmt: - 11.119.0.43 - 11.119.0.44 - 11.119.0.45 - 11.119.0.43 << and here
[stack@director ~]$ ./deploy.sh ++ openstack overcloud deploy --templates -r /home/stack/custom-templates/custom-roles.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml -e /home/stack/custom-templates/network.yaml -e /home/stack/custom-templates/ceph.yaml -e /home/stack/custom-templates/compute.yaml -e /home/stack/custom-templates/layout.yaml --stack ADN-ultram --debug --log-file overcloudDeploy_11_06_17__16_39_26.log --ntp-server 172.24.167.109 --neutron-flat-networks phys_pcie1_0,phys_pcie1_1,phys_pcie4_0,phys_pcie4_1 --neutron-network-vlan-ranges datacentre:1001:1050 --neutron-disable-tunneling --verbose --timeout 180 … Starting new HTTP connection (1): 192.200.0.1 "POST /v2/action_executions HTTP/1.1" 201 1695 HTTP POST http://192.200.0.1:8989/v2/action_executions 201 Overcloud Endpoint: http://10.1.2.5:5000/v2.0 Overcloud Deployed clean_up DeployOvercloud: END return value: 0 real 38m38.971s user 0m3.605s sys 0m0.466s
[stack@director ~]$ openstack stack list +--------------------------------------+------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 5df68458-095d-43bd-a8c4-033e68ba79a0 | ADN-ultram | UPDATE_COMPLETE | 2017-11-02T21:30:06Z | 2017-11-06T21:40:58Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+
[stack@director ~]$ source stackrc [stack@director ~]$ nova list |grep osd-compute-3 | 0f2d88cd-d2b9-4f28-b2ca-13e305ad49ea | pod1-osd-compute-3 | ACTIVE | - | Running | ctlplane=192.200.0.117 | [stack@director ~]$ source corerc [stack@director ~]$ openstack hypervisor list |grep osd-compute-3 | 63 | pod1-osd-compute-3.localdomain |
[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph -s cluster eb2bb192-b1c9-11e6-9205-525400330666 health HEALTH_WARN 223 pgs backfill_wait 4 pgs backfilling 41 pgs degraded 227 pgs stuck unclean 41 pgs undersized recovery 45229/1300136 objects degraded (3.479%) recovery 525016/1300136 objects misplaced (40.382%) monmap e1: 3 mons at {Pod1-controller-0=11.118.0.40:6789/0,Pod1-controller-1=11.118.0.41:6789/0,Pod1-controller-2=11.118.0.42:6789/0} election epoch 58, quorum 0,1,2 Pod1-controller-0,Pod1-controller-1,Pod1-controller-2 osdmap e986: 12 osds: 12 up, 12 in; 225 remapped pgs flags sortbitwise,require_jewel_osds pgmap v781746: 704 pgs, 6 pools, 533 GB data, 344 kobjects 1553 GB used, 11840 GB / 13393 GB avail 45229/1300136 objects degraded (3.479%) 525016/1300136 objects misplaced (40.382%) 477 active+clean 186 active+remapped+wait_backfill 37 active+undersized+degraded+remapped+wait_backfill 4 active+undersized+degraded+remapped+backfilling
[heat-admin@pod1-osd-compute-3 ~]$ sudo ceph -s
cluster eb2bb192-b1c9-11e6-9205-525400330666 health HEALTH_OK monmap e1: 3 mons at {Pod1-controller-0=11.118.0.40:6789/0,Pod1-controller-1=11.118.0.41:6789/0,Pod1-controller-2=11.118.0.42:6789/0} election epoch 58, quorum 0,1,2 Pod1-controller-0,Pod1-controller-1,Pod1-controller-2 osdmap e1398: 12 osds: 12 up, 12 in flags sortbitwise,require_jewel_osds pgmap v784311: 704 pgs, 6 pools, 533 GB data, 344 kobjects 1599 GB used, 11793 GB / 13393 GB avail 704 active+clean client io 8168 kB/s wr, 0 op/s rd, 32 op/s wr [heat-admin@pod1-osd-compute-3 ~]$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 13.07996 root default -2 0 host pod1-osd-compute-0 -3 4.35999 host pod1-osd-compute-2 1 1.09000 osd.1 up 1.00000 1.00000 4 1.09000 osd.4 up 1.00000 1.00000 7 1.09000 osd.7 up 1.00000 1.00000 10 1.09000 osd.10 up 1.00000 1.00000 -4 4.35999 host pod1-osd-compute-1 2 1.09000 osd.2 up 1.00000 1.00000 5 1.09000 osd.5 up 1.00000 1.00000 8 1.09000 osd.8 up 1.00000 1.00000 11 1.09000 osd.11 up 1.00000 1.00000 -5 4.35999 host pod1-osd-compute-3 0 1.09000 osd.0 up 1.00000 1.00000 3 1.09000 osd.3 up 1.00000 1.00000 6 1.09000 osd.6 up 1.00000 1.00000 9 1.09000 osd.9 up 1.00000 1.00000
可以重新部署上一个实例,并在前面的步骤中拍摄快照。
步骤1.(可选)如果以前没有可用的VM快照,则连接到发送备份的OSPD节点,并将备份SFTP返回到其原始OSPD节点。使用sftp root@x.x.x.xwhere x.x.x.x是原始OSPD的IP。将快照文件保存在/tmp目录中。
步骤2.连接到实例重新部署的OSPD节点。
使用以下命令来源化环境变量:
# source /home/stack/pod1-stackrc-Core-CPAR
步骤3.要将快照用作映像,必须将其上传到水平。运行下一个命令。
#glance image-create -- AAA-CPAR-Date-snapshot.qcow2 --container-format bare --disk-format qcow2 --name AAA-CPAR-Date-snapshot
该过程可在水平线中看到,如下图所示。
步骤4.在“展望期”中,导航至“项目”>“实例”,然后单击“启动实例”,如下图所示。
步骤5.输入实例名称并选择可用区,如下图所示。
步骤6.在“源”选项卡中,选择图像以创建实例。在“选择启动源”菜单中,选择image,显示映像列表,选择之前通过单击其+号上载的映像,如此图像所示。
步骤7.在Flavor选项卡中,单击+号选择AAA风味,如下图所示。
步骤8.最后,导航至“网络”选项卡,并单击+号选择实例所需的网络。对于此情况,请选择diameter-soutable1、radius-routable1和tb1-mgmt,如此图所示。
步骤9.最后,单击“启动实例”以创建它。可以在Horizon中监控进度,如下图所示。
几分钟后,该实例将完全部署并准备使用。
浮动IP地址是可路由的地址,这意味着它可以从Ultra M/Openstack体系结构外部访问,并且能够从网络与其他节点通信。
步骤1.在“水平线顶部”菜单中,导航至“管理”>“浮动IP”。
步骤2.单击“将IP分配到项目”。
步骤3.在分配浮动IP窗口中,选择新浮动IP所属的池、要分配该浮动IP的项目,以及新的浮动IP地址本身。
例如:
步骤4.单击Allocate Floating IP。
步骤5.在“展望期顶部”菜单中,导航至“项目”>“实例”。
步骤6.在“操作”列中,单击指向“创建快照”按钮下方的箭头,必须显示菜单。选择关联浮动IP选项。
步骤7.在“IP地址”字段中选择要使用的相应浮动IP地址,并从要在要关联的端口中分配此浮动IP的新实例中选择相应的管理接口(eth0)。请参考下一个映像作为此过程的示例。
步骤8.最后,单击“关联”。
步骤1.在“展望期顶部”菜单中,导航至“项目”>“实例”。
步骤2.单击“将新实例午餐”部分中创建的实例/VM的名称。
步骤3.单击“控制台”。这将显示VM的CLI。
步骤4.显示CLI后,输入正确的登录凭据:
username:根
密码:cisco123,如下图所示。
步骤5.在CLI中,运行命令vi /etc/ssh/sshd_config以编辑ssh配置。
步骤6.打开SSH配置文件后,按I以编辑文件。然后查找此处显示的部分,并将第一行从PasswordAuthentication no更改为PasswordAuthentication yes。
步骤7.按ESC并输入:wq!保存sshd_config文件更改。
步骤8.运行命令service sshd restart(服务sshd重新启动)。
步骤9.为了测试SSH配置更改已正确应用,请打开任何SSH客户端,并尝试使用分配给实例(即10.145.0.249)和用户root。
步骤1.使用安装应用的相应VM/服务器的IP地址打开SSH会话,如此图所示。
执行以下步骤后,活动完成后,可在关闭的站点中重新建立CPAR服务。
步骤1.重新登录展望期,导航至“项目”>“实例”>“启动实例”。
步骤2.验证实例的状态为“活动”,且电源状态为“运行”,如下图所示。
步骤1.在操作系统级别运行命令/opt/CSCOar/bin/arstatus:
[root@wscaaa04 ~]# /opt/CSCOar/bin/arstatus Cisco Prime AR RADIUS server running (pid: 24834) Cisco Prime AR Server Agent running (pid: 24821) Cisco Prime AR MCD lock manager running (pid: 24824) Cisco Prime AR MCD server running (pid: 24833) Cisco Prime AR GUI running (pid: 24836) SNMP Master Agent running (pid: 24835) [root@wscaaa04 ~]#
步骤2.在操作系统级别运行命令/opt/CSCOar/bin/aregcmd并输入管理员凭证。验证CPAr Health(CPAr运行状况)为10/10,并退出CPAR CLI。
[root@aaa02 logs]# /opt/CSCOar/bin/aregcmd Cisco Prime Access Registrar 7.3.0.1 Configuration Utility Copyright (C) 1995-2017 by Cisco Systems, Inc. All rights reserved. Cluster: User: admin Passphrase: Logging in to localhost [ //localhost ] LicenseInfo = PAR-NG-TPS 7.2(100TPS:) PAR-ADD-TPS 7.2(2000TPS:) PAR-RDDR-TRX 7.2() PAR-HSS 7.2() Radius/ Administrators/ Server 'Radius' is Running, its health is 10 out of 10 --> exit
步骤3.运行命令netstat | grep diameter,并验证所有DRA连接都已建立。
此处提到的输出适用于需要Diameter链路的环境。如果显示的链路较少,则表示与需要分析的DRA断开。
[root@aa02 logs]# netstat | grep diameter tcp 0 0 aaa02.aaa.epc.:77 mp1.dra01.d:diameter ESTABLISHED tcp 0 0 aaa02.aaa.epc.:36 tsa6.dra01:diameter ESTABLISHED tcp 0 0 aaa02.aaa.epc.:47 mp2.dra01.d:diameter ESTABLISHED tcp 0 0 aaa02.aaa.epc.:07 tsa5.dra01:diameter ESTABLISHED tcp 0 0 aaa02.aaa.epc.:08 np2.dra01.d:diameter ESTABLISHED
步骤4.检查TPS日志是否显示CPAR正在处理的请求。突出显示的值代表TPS,这些值是您需要注意的值。
TPS的值不得超过1500。
[root@wscaaa04 ~]# tail -f /opt/CSCOar/logs/tps-11-21-2017.csv 11-21-2017,23:57:35,263,0 11-21-2017,23:57:50,237,0 11-21-2017,23:58:05,237,0 11-21-2017,23:58:20,257,0 11-21-2017,23:58:35,254,0 11-21-2017,23:58:50,248,0 11-21-2017,23:59:05,272,0 11-21-2017,23:59:20,243,0 11-21-2017,23:59:35,244,0 11-21-2017,23:59:50,233,0
步骤5.在name_radius_1_log中查找任何“error”或“alarm”消息。
[root@aaa02 logs]# grep -E "error|alarm" name_radius_1_log
步骤6.要验证CPAR进程使用的内存量,请运行以下命令:
top | grep radius
[root@sfraaa02 ~]# top | grep radius 27008 root 20 0 20.228g 2.413g 11408 S 128.3 7.7 1165:41 radius
此突出显示值必须低于7Gb,这是应用级别允许的最大值。