Introduction
This documents describes the procedure to recover Cisco Virtualized Policy and Charging Rules Function (vPCRF) instances deployed on Ultra-M/Openstack Deployment.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Openstack
- CPS
- The compute on which affected instances were deployed is now available.
- Compute resources are available in the same availability zone as the affected instance.
- Backup procedures as mentioned in the document are followed/scheduled periodically.
Components Used
The information in this document is based on CPS and applicable to all versions.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Troubleshoot
Power on Load Balancer from SHUTOFF State
If any instance is in SHUTOFF state due to a planned shutdown or some other reason, please use this procedure to start the instance and enable to monitor it in ESC.
- Check the state of instance via OpenStack.
source /home/stack/destackovsrc-Pcrf
nova list --fields name,host,status | grep PD
| c5e4ebd4-803d-45c1-bd96-fd6e459b7ed6 | r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957 | destackovs-compute-2 | SHUTOFF|
- Check if the compute is available and ensure state is up.
source /home/stack/destackovsrc
nova hypervisor-show destackovs-compute-2 | egrep ‘status|state’
| state | up |
| status | enabled |
- Login to ESC Active as admin user and Check the state of instance in opdata.
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli get esc_datamodel/opdata | grep PD
r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957 VM_ERROR_STATE
- Power on the instance from openstack.
source /home/stack/destackovsrc-Pcrf
nova start r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957
- Wait five minutes for the instance to boot up and come to active state.
source /home/stack/destackovsrc-Pcrf
nova list -fields name,status | grep cm
| c5e4ebd4-803d-45c1-bd96-fd6e459b7ed6 | r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957 | ACTIVE |
- Enable VM Monitor in ESC after instance is in active state
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli vm-action ENABLE_MONITOR r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957
- For Further recovery of instance configurations, refer instance type specific procedures provided below.
Recover any Instance from ERROR State
This procedure is used if state of CPS instance in openstack is ERROR:
- Check the state of instance in OpenStack.
source /home/stack/destackovsrc-Pcrf
nova list --fields name,host,status | grep PD
| c5e4ebd4-803d-45c1-bd96-fd6e459b7ed6 | r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957 | destackovs-compute-2 | ERROR|
- Check if the compute is available and runs fine.
source /home/stack/destackovsrc
nova hypervisor-show destackovs-compute-2 | egrep ‘status|state’
| state | up |
| status | enabled |
- Login to ESC Active as admin user and check the state of instance in opdata.
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli get esc_datamodel/opdata | grep PD
r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957 VM_ERROR_STATE
- Reset the state of instance to force the instance back to an active state instead of an error state, once done, reboot your instance.
source /home/stack/destackovsrc-Pcrf
nova reset-state –active r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957
nova reboot –-hard r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957
- Wait five minutes for the instance to boot up and come to active state.
source /home/stack/destackovsrc-Pcrf
nova list –fields name,status | grep PD
| c5e4ebd4-803d-45c1-bd96-fd6e459b7ed6 | r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957 | ACTIVE |
- If Cluster Manager changes state to ACTIVE after reboot, Enable VM Monitor in ESC after Cluster Manager instance is in active state.
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli vm-action ENABLE_MONITOR r5-PD_arb_0_2eb86cbf-07e5-4e14-9002-8990588b8957
- Post recovery to running/active state, refer instance type specific procedure to recover config/data from backup.
Load Balancer Recovery
If a load balancer is recently recovered, use this procedure to restore haproxy and network setting:
- The backup and restore script is a Python script that is used to take the backup of the requested configuration item, available locally on Cluster Manager VM or on other VMs. When a restore is required, the configuration supplied is copied to the requested location within Cluster Manager VM or the specific VM.
Name : config_br.py
Path : /var/qps/install/current/scripts/modules
VM : Cluster Manager
When you run this script, you provide options and specify the location for the backup file
If it is required to import the restore LB configuration data on the Cluster Manager, run this command:
config_br.py -a import --network --haproxy --users /mnt/backup/< backup_27092016.tar.gz >
Usage Examples:
config_br.py -a export --etc --etc-oam --svn --stats /mnt/backup/backup_27092016.tar.gz
Backup /etc/broadhop configuration data from OAM (pcrfclient) VM,Policy Builder configuration, and logstash
config_br.py -a import --etc --etc-oam --svn --stats /mnt/backup/backup_27092016.tar.gz<
Restore data from /etc/broadhop configuration from OAM (pcrfclient) VM,Policy Builder configuration, and logstash
If there is still a problem with the stability and there is requirement to reinit the load balancer VM with the Cluster Manager puppet configuration files then perform below steps 2 & 3
- To generate the VM archive files on the Cluster Manager using the latest configurations, run this command on cluster Manager:
/var/qps/install/current/scripts/build/build_all.sh
- To update load balancer with the latest configuration login to load-balancer and run this:
ssh lbxx
/etc/init.d/vm-init
Verify
To verify whether the LB is fully recovered or not run “monit summary” on the LB. This command will verify that all qns processes and all processes monitored by monit are in a good state
- Inspect the qns-x.log in /var/log/broadhop. This logs can be inspected for any errors or failures that could affect the processing of traffic