Overview
This guide provides step by step instructions on how to troubleshoot issues among the VNFs deployed and managed by ESC.
The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This guide provides step by step instructions on how to troubleshoot issues among the VNFs deployed and managed by ESC.
Before performing ESC troubleshooting for VNF deployment, the first step is to collect and check logs. To collect ESC logs:
For ESC Release 2.3.2:
sudo /opt/cisco/esc/esc-scripts/collect_esc_log.sh
For ESC Release 3.0 and later:
sudo escadm log collect
There are several important logs containing useful error messages.
YangESC log contains the incoming request and notification:
/var/log/esc/yangesc.log
ESCManager log contains the ESC processing details:
/var/log/esc/escmanager.log
VimManager log contains the VimManager processing details:
/var/log/esc/vimmanager/vimmanager.log
Vim_VimManager log contains raw request/response between VimManager and VIM
/var/log/esc/vimmanager/vim_vimmanager.log
Mona log conatins monitoring procesing defails plus script executions
/var/log/esc/mona/mona.log
Problem
When you received the VM_DEPLOYED notification (thru Netconf, REST, Portal) with a non-200 status code means the deployment failed due to an error.
Notification found in /var/log/esc/yangesc.log
02:19:25,758 23-Jan-2018 WARN ===== SEND NOTIFICATION STARTS =====
02:19:25,758 23-Jan-2018 WARN Type: VM_DEPLOYED
02:19:25,758 23-Jan-2018 WARN Status: FAILURE
02:19:25,758 23-Jan-2018 WARN Status Code: 500
02:19:25,759 23-Jan-2018 WARN Status Msg: VIM Driver: Exception while creating VM: Error creating VM from template, the host [10.67.103.255] does not exist
02:19:25,759 23-Jan-2018 WARN Tenant: admin
02:19:25,759 23-Jan-2018 WARN Deployment ID: 169384d7-c67b-40a4-bcaa-dd3294305ba3
02:19:25,759 23-Jan-2018 WARN Deployment name: Vmware-NetConf-Intra-Affinity-Anti-Affinity-With-InvalidCluster-InvalidHost
02:19:25,759 23-Jan-2018 WARN VM group name: Group2-uLinux-Intra-Anti-Affinity-With-InvalidHost
02:19:25,759 23-Jan-2018 WARN User configs: 1
02:19:25,759 23-Jan-2018 WARN VM Name: Vmware-NetConf-I_Group2_0_65aa6ca8-3b53-4eb3-a39f-a3f12394a190
02:19:25,759 23-Jan-2018 WARN Host ID:
02:19:25,759 23-Jan-2018 WARN Host Name:
02:19:25,760 23-Jan-2018 WARN ===== SEND NOTIFICATION ENDS =====
Solution
The reason of the failure is within the status message itself. The previous example shows that the targeted host for deployment does not exist. Following is another example of a failed VM deployment:
Notification found in /var/log/esc/yangesc.log
07:20:56,164 25-Jan-2018 WARN Status Msg: Failed to create VM ports for instance : jenkins-ErrHandl_ErrorG_2_cc0f8c28-8900-4977-90d9-b9f996c8ca71. Create port operation failed: Exception during processing: com.cisco.esc.vimmanager.exceptions.CreatePortException: Create port failed: ClientResponseException{message=No more IP addresses available on network 0b7965b4-c604-444c-8cbb-7c2399e912d4., status=409, status-code=CONFLICT}.
The previous example shows a deployment failure message containing direct response from VIM that has an error message and status code. In case of deployment failure due to role access issues such as these VIM related issues, perform the required action on the VIM instance, or adjust ESC deployment datamodel with proper configurations. Here are some common VIM related issues:
Out of quota errors
Either delete some resource in question via ESC under that tenant/project/user or,
Configure your VIMs resource limit per tenant/project/user.
Already in use errors
Change resource name/configuration or sometimes there are restrictions on the VIM disallow and is sometimes configurable.
Delete the resource in question.
Problem
If the VNF deployment datamodel involves LCM action (for example, a staging script), the deployment may have failed because the action failed to complete. In this case, you get the the following error message in /var/log/esc/escmanager.log:
22:12:11,912 25-Jan-2018 VM_STATE_MACHINE-ab-auto-test-vnf_ab-aut_0_31ebad33-e12f-4772-a89c-3bdc239acf69 ERROR [StateMachineCloudUtils.java:setupPersonalities():1081] [tid=fffae7af-a321-4ea5-a1bc-3b30c903f3a5]
com.cisco.esc.datamodel.exceptions.ESCException: Action [GEN_VPC_ISO] failed
at com.cisco.esc.statemachines.utils.StateMachineCloudUtils.setupPersonalities(StateMachineCloudUtils.java:1069)
Description
To find out the details of a staging script failure, look for some entries like the following in the /var/log/esc/mona/mona.log. log:
/var/log/esc/mona/mona.log
2018-01-25 19:34:45.751 [http-nio-127.0.0.1-8090-exec-5] Script: [/opt/cisco/esc/esc-scripts/esc_volume_em_staging.sh] execution in progress
2018-01-25 19:34:45.751 [http-nio-127.0.0.1-8090-exec-5] Use the original script path and skip downloading: no protocol: /opt/cisco/esc/esc-scripts/esc_volume_em_staging.sh
2018-01-25 19:49:45.772 [http-nio-127.0.0.1-8090-exec-5] Script execution failed, timer expired for script: /opt/cisco/esc/esc-scripts/esc_volume_em_staging.sh
2018-01-25 19:49:45.805 [http-nio-127.0.0.1-8090-exec-5] Script execution failed
com.cisco.esc.mona.exceptions.ActionExecutionException: Script execution failed, timer expired for script:/opt/cisco/esc/esc-scripts/esc_volume_em_staging.sh
Solution
Some common errors can be a permission issue, or script execution timed out. Do a dry run for the script on the ESC VM to ensure it works.
Problem
When deploying VNF on OpenStack as a non-admin user, deployment may encounter role access errors such as:
02:19:25,758 23-Jan-2018 WARN ===== SEND NOTIFICATION STARTS =====
02:19:25,758 23-Jan-2018 WARN Type: VM_DEPLOYED
02:19:25,758 23-Jan-2018 WARN Status: FAILURE
02:19:25,758 23-Jan-2018 WARN Status Code: 500
02:19:25,759 23-Jan-2018 WARN Status Msg: VIM Driver: Exception while creating VM: {"message": "You are not authorized to perform the requested action: identity:create_project", "code": 403, "title": "Forbidden"}}
02:19:25,759 23-Jan-2018 WARN Tenant: admin
02:19:25,759 23-Jan-2018 WARN Deployment ID: 169384d7-c67b-40a4-bcaa-dd3294305ba3
02:19:25,759 23-Jan-2018 WARN Deployment name: Vmware-NetConf-Intra-Affinity-Anti-Affinity-With-InvalidCluster-InvalidHost
02:19:25,759 23-Jan-2018 WARN VM group name: Group2-uLinux-Intra-Anti-Affinity-With-InvalidHost
02:19:25,759 23-Jan-2018 WARN User configs: 1
02:19:25,759 23-Jan-2018 WARN VM Name: Vmware-NetConf-I_Group2_0_65aa6ca8-3b53-4eb3-a39f-a3f12394a190
02:19:25,759 23-Jan-2018 WARN Host ID:
02:19:25,759 23-Jan-2018 WARN Host Name:
02:19:25,760 23-Jan-2018 WARN ===== SEND NOTIFICATION ENDS =====
Solution
ESC Release 3.1 and later requires two permissions granted in Neutron
create_port:fixed_ips
create_port:mac_address
Create a new role on OpenStack for ESC. Go to the OpenStack Horizon (Identity -> Roles), and create a new role with the name vnfm or any other name you want.
Assign the user with the vnfm role to a project managed by ESC on OpenStack Horizon (Identity → Projects). Click Manage members and make sure the user for ESC has the vnfm role.
Modify the following items in OpenStack controller by adding 'or role:vnfm' in the default values. Changes to the policy.json
file becomes effective immediately and does not require service restart.
/etc/neutron/policy.json |
"create_port:fixed_ips": "rule:context_is_advsvc or rule:admin_or_network_owner", |
"create_port:fixed_ips": "rule:context_is_advsvc or rule:admin_or_network_owner or role:vnfm", |
/etc/neutron/policy.json |
"create_port:mac_address": "rule:context_is_advsvc or rule:admin_or_network_owner" |
"create_port:mac_address": "rule:context_is_advsvc or rule:admin_or_network_owner or role:vnfm" |
Problem
Consider the VNF is deployed, the VM_DEPLOYED notification with a 200 status code is received, but VM_ALIVE is not yet received. While checking the VNF's console through VIM UI (for example, OpenStack Horizon, VMware vCenter), the VNF goes into a reboot cycle/loop. Most of these cases are a symptom of a failure day-0 data passed. To verify the day 0 data passed in, for OpenStack, check the /var/log/esc/vimmanger/vim_vimmanager.log, looking for the POST request sent to OpenStack for creating a server.
2018-01-26 16:02:55.648 INFO os - 1 * Sending client request on thread http-nio-127.0.0.1-8095-exec-4
1 > POST http://ocata1-external-controller:8774/v2/d6aee06abdbe42edaade348280199a64/servers
1 > Accept: application/json
1 > Content-Type: application/json
1 > User-Agent: OpenStack4j / OpenStack Client
1 > X-Auth-Token: ***masked***
{
"server" : {
"name" : "jenkins-jenkinsy_MAKULA_0_bbc61ba6-6c63-4fb9-b9cd-5ae92a898943",
"imageRef" : "67fc9890-230e-406c-bd01-f2e1ffa2437f",
"flavorRef" : "cc12dec2-411a-46bd-b8c2-4ff8738ddb02",
"personality" : [ {
"path" : "iosxe_config.txt",
"contents" : "aG9zdG5hbWUgY3NyCiEKcGxhdGZvcm0gY29uc29sZSBzZXJpYWwKIQppcCBzdWJuZXQtemVybwpubyBpcCBkb21haW4tbG9va3VwCmlwIGRvbWFpbiBuYW1lIGNpc2NvLmNvbQohCmVuYWJsZSBwYXNzd29yZCBjaXNjbzEyMwp1c2VybmFtZSBhZG1pbiBwYXNzd29yZCBjaXNjbzEyMwp1c2VybmFtZSBhZG1pbiBwcml2aWxlZ2UgMTUKIQppbnRlcmZhY2UgR2lnYWJpdEV0aGVybmV0MQogZGVzY3JpcHRpb24gbWFuYWdlbWVudCBuZXR3b3JrCiBpcCBhZGRyZXNzIGRoY3AKIG5vIHNodXQKIQppbnRlcmZhY2UgR2lnYWJpdEV0aGVybmV0MgogZGVzY3JpcHRpb24gc2VydmljZSBuZXR3b3JrCiBpcCBhZGRyZXNzIGRoY3AKIG5vIHNodXQKIQppbnRlcmZhY2UgR2lnYWJpdEV0aGVybmV0MwogZGVzY3JpcHRpb24gc2VydmljZSBuZXR3b3JrCiBpcCBhZGRyZXNzIGRoY3AKIG5vIHNodXQKIQpjcnlwdG8ga2V5IGdlbmVyYXRlIHJzYSBtb2R1bHVzIDEwMjQKaXAgc3NoIHZlcnNpb24gMgppcCBzc2ggYXV0aGVudGljYXRpb24tcmV0cmllcyA1CmlwIHNjcCBzZXJ2ZXIgZW5hYmxlCmZpbGUgcHJvbXB0IHF1aWV0CiEKbGluZSBjb24gMAogc3RvcGJpdHMgMQpsaW5lIHZ0eSAwIDQKIGxvZ2luIGxvY2FsCiBwcml2aWxlZ2UgbGV2ZWwgMTUKIHRyYW5zcG9ydCBpbnB1dCBzc2ggdGVsbmV0CiB0cmFuc3BvcnQgb3V0cHV0IHNzaCB0ZWxuZXQKIQpzbm1wLXNlcnZlciBjb21tdW5pdHkgcHVibGljIFJPCiEKZW5kCg=="
} ],
"config_drive" : true,
"networks" : [ {
"port" : "01b3b168-8fab-4da4-b195-f9652d36674e"
}, {
"port" : "dfe709c9-4ca4-400f-a765-2dbc88828585"
} ],
"block_device_mapping_v2" : [ {
"source_type" : "image",
"destination_type" : "local",
"uuid" : "67fc9890-230e-406c-bd01-f2e1ffa2437f",
"boot_index" : 0,
"delete_on_termination" : true
} ]
}
}
Solution
Decode the base64 encoded string value of the personality content.
hostname csr
!
platform console serial
!
ip subnet-zero
no ip domain-lookup
ip domain name cisco.com
!
enable password cisco123
username admin password cisco123
username admin privilege 15
!
interface GigabitEthernet1
description management network
ip address dhcp
no shut
!
interface GigabitEthernet2
description service network
ip address dhcp
no shut
!
interface GigabitEthernet3
description service network
ip address dhcp
no shut
!
crypto key generate rsa modulus 1024
ip ssh version 2
ip ssh authentication-retries 5
ip scp server enable
file prompt quiet
!
line con 0
stopbits 1
line vty 0 4
login local
privilege level 15
transport input ssh telnet
transport output ssh telnet
!
snmp-server community public RO
!
end
Verify if the content contains correct day 0 configuration or not. If the day 0 config is passed through a volume, detach the volume from the VNF, and attach it to another VM to check its content.
For VMware, if the day 0 config is passed through the ovf settings, verify it from the vCenter:
Open VM settings.
Under Options, select 'OVF Settings'.
Click 'View' under OVF Enviroment.
If the day 0 config is passed through CDROM, you can verify through the ISO file that was mounted to the CDROM.
Download the ISO file from the specific datastore, then mount the ISO file locally to verify its content.
Problem
If a VNF is deployed successfully, but never received a VM_ALIVE notification, then it is inferred that the ESC was unable to reach the newly deployed VNF. This is mostly because of an issue in the network. Check the KPI section of the VNF deployment datamodel:
<kpi_data>
<kpi>
<event_name>VM_ALIVE</event_name>
<metric_value>1</metric_value>
<metric_cond>GT</metric_cond>
<metric_type>UINT32</metric_type>
<metric_occurrences_true>1</metric_occurrences_true>
<metric_occurrences_false>30</metric_occurrences_false>
<metric_collector>
<type>ICMPPing</type>
<nicid>2</nicid>
<poll_frequency>10</poll_frequency>
<polling_unit>seconds</polling_unit>
<continuous_alarm>false</continuous_alarm>
</metric_collector>
</kpi>
</kpi_data>
Solution
To verify if the VNF is alive, do an ICMP ping from ESC VM to the VNF using a particular IP indicated by,
<nicid>2</nicid>
where, nicid 2 refers to the IP of the inteface with nicid of 2 which ESC is about to ping, pointing to:
<interface>
<nicid>2</nicid>
<network>NVPGW100-UAS-uas-orchestration</network>
<allowed_address_pairs>
<address>
<ip_address>172.168.11.0</ip_address>
<netmask>255.255.255.0</netmask>
</address>
</allowed_address_pairs>
</interface>
Here 172.168.11.0 is the IP. Ensure that the interface shares the same network with ESC. In the previous example, the network is NVPGW100-UAS-uas-orchestration. If the ping fails, you can ping the the gateway or another IP available on the subnet to find out if the issue is with the network.
The following are some common recovery issues:
Recovery behaviour not behaving as expected. ESC is not attempting a redeploy after a failed reboot.
Ensure the recovery policy in the XML file is set to REBOOT_THEN_REDEPLOY, and not set to reboot only. Read the recovery documentation to understand the recovery options and expectations.
ESC only attempts the recovery once, or too many times
Double check the config parameter "VM_RECOVERY_RETRIES_MAX", the default value is 3 times. To check this value, execute the REST call within the ESC VM.
curl -H "accept: Application/json" http://127.0.0.1:8080/ESCManager/v0/config/default/VM_RECOVERY_RETRIES_MAX | python -mjson.tool
If it is set correctly, ensure ESC was healthy at the time of recovery and that a switch over did not occur; tt may have continued the recovery attempt in the second ESC VM.
Problem
VNF VM recovery has failed because of VM reboot failed event. This depends on the VM recovery policy definition:
<recovery_policy>
<recovery_type>AUTO</recovery_type>
<action_on_recovery>REBOOT_ONLY</action_on_recovery>
<max_retries>3</max_retries>
</recovery_policy>
Description
ESC tries to reboot the VNF VM three times with non-success states before it has received the RECOVERY_COMPLETED event with an error state. The reboot operation also depends on two other system-wide configurations. parameters
VM_STATUS_POLLING_VM_OPERATION_RETRIES
VM_STATUS_POLLING_WAIT_TIME_SEC
After ESC asks VIM to reboot VM, it will keep polling VM status. The VM_STATUS_POLLING_VM_OPERATION_RETRIES defines how many times ESC tries to poll, and VM_STATUS_POLLING_WAIT_TIME_SEC defines how long does ESC wait between polls. The following are their default values:
VM_STATUS_POLLING_VM_OPERATION_RETRIES=10
VM_STATUS_POLLING_WAIT_TIME_SEC=5
Solution
If VNF VM takes more than 50 seconds to transition from REBOOT to ACTIVE state in OpenStack, then change the VM_STATUS_POLLING_WAIT_TIME_SEC to a higher number through the ESC REST API:
curl -X PUT -H "accept:application/json" http://localhost:8080/ESCManager/v0/config/openstack/vm_status_polling_wait_time_sec/20 -k | python -mjson.tool
After receiving a success response, do a VM manual recovery again.
If a VNF VM is in an ERROR state in ESC, there are two options to transition it back to an ALIVE state considering that the external issue(s) that caused the ERROR state is resloved (for example, an issue on VIM). Before doing any of the following two options, ensure there is no ongoing operation performed on the same deployment. Check this in the /var/log/esc/yangesc.log. Look for any previously initiated action without a completed notification (either success or failure). If any ongoing operation is found, wait for the operation to complete before performing the following actions:
Execute the following command:
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO {ESC generated VM Name}
Execute the following command to unset the monitoring:
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli vm-action DISABLE_MONITOR {ESC generated VM Name}
Then enable it again:
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli vm-action ENABLE_MONITOR {ESC generated VM Name}
Using Script to Prepare Datamodels
Please run the script on the ESC VM. The script generates two datamodel xml files, one for deleting the VM group(s), and the other one for adding the VM group(s) back.
[admin@abc-test-232 ~]$ ./genVMGroupDeletionDM.py -h
usage: genVMGroupDeletionDM.py [-h] vm_group_name [vm_group_name ...]
*****************************************************************
Utility tool for generating VM group removing datamodel for ESC
Check the following wiki for details
https://confluence-eng-sjc1.cisco.com/conf/display/ESCWIKI/How+to+Use+Service+Update+to+Remove+a+VM+Group
positional arguments:
vm_group_name <Required> VM group name(s) separate by space
optional arguments:
-h, --help show this help message and exit
Example
[admin@abc-test-232 ~]$ ./genVMGroupDeletionDM.py g1 g2
Datamodel is generated:
[/home/admin/delete_g1_g2.xml]
[/home/admin/add_g1_g2.xml]
** Use on your own risk! **
[admin@abc-test-232 ~]$
Manual Prepare Datamodels
Get the current ESC datamodel
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli get esc_datamodel/tenants > {file name}
Remove the extra wrappers <data>
and <rpc-reply>
from the original file (also any cli outputs before the <xml>
tag) from step 1. The end result will look like:
Example Datamodel After Step #2
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
<data>
<esc_datamodel xmlns="http://www.cisco.com/esc/esc">
Note |
If the datamodel contains multiple deployments, ensure to keep the other deployments as is when the datamodel is being edited. Do not change any text formatting. Remove other deployment section from the datamodel completely. This ensures that when the service update happens, these deployment(s) will remain untouched. For example, if the VM group to be deleted is c3, when editing the datamodel, you can remove the deployment portion for EM from the datamodel. |
The following example shows how to delete a VM group c3. First check if there is any <placement_group> or <placement> policy(ies)
defined under <policies>
which has VM group c3 involved. Mark the policies and delete them. There are 8 placement policies in the following example:
...
<placement>
<target_vm_group_ref>c3</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
</placement>
<placement>
<target_vm_group_ref>s10</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref>c3</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
<vm_group_ref>s4</vm_group_ref>
<vm_group_ref>s5</vm_group_ref>
<vm_group_ref>s6</vm_group_ref>
<vm_group_ref>s7</vm_group_ref>
<vm_group_ref>s8</vm_group_ref>
<vm_group_ref>s9</vm_group_ref>
</placement>
...
<placement>
<target_vm_group_ref>s4</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref>c3</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
</placement>
<placement>
<target_vm_group_ref>s5</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref nc:operation='delete'>c3</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
<vm_group_ref>s4</vm_group_ref>
</placement>
<placement>
<target_vm_group_ref>s6</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref>c3</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
<vm_group_ref>s4</vm_group_ref>
<vm_group_ref>s5</vm_group_ref>
</placement>
<placement>
<target_vm_group_ref>s7</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref>c3</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
<vm_group_ref>s4</vm_group_ref>
<vm_group_ref>s5</vm_group_ref>
<vm_group_ref>s6</vm_group_ref>
</placement>
<placement>
<target_vm_group_ref>s8</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref>c3</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
<vm_group_ref>s4</vm_group_ref>
<vm_group_ref>s5</vm_group_ref>
<vm_group_ref>s6</vm_group_ref>
<vm_group_ref>s7</vm_group_ref>
</placement>
<placement>
<target_vm_group_ref>s9</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref>c3</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
<vm_group_ref>s4</vm_group_ref>
<vm_group_ref>s5</vm_group_ref>
<vm_group_ref>s6</vm_group_ref>
<vm_group_ref>s7</vm_group_ref>
<vm_group_ref>s8</vm_group_ref>
</placement>
For c3 as the target_vm_group, delete the whole placement policy by adding attribute nc:operation='delete'
to the XML element.
<placement nc:operation='delete'>
<target_vm_group_ref>c3</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
</placement>
For c3 as the vm_group_ref, remove the vm_group_ref entry itself, and keep other relationships as is.
<placement>
<target_vm_group_ref>s10</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c1</vm_group_ref>
<vm_group_ref nc:operation='delete'>c3</vm_group_ref>
<vm_group_ref>s2</vm_group_ref>
<vm_group_ref>s4</vm_group_ref>
<vm_group_ref>s5</vm_group_ref>
<vm_group_ref>s6</vm_group_ref>
<vm_group_ref>s7</vm_group_ref>
<vm_group_ref>s8</vm_group_ref>
<vm_group_ref>s9</vm_group_ref>
</placement>
For a placement policy that has only one vm_group_ref element, either the the vm_group_ref is c3 or the target_vm_group is c3, remove the whole policy. This is becuase, when c3 is removed this policy does not have any meaning:
<placement nc:operation='delete'>
<target_vm_group_ref>c11</target_vm_group_ref>
<type>anti_affinity</type>
<enforcement>strict</enforcement>
<vm_group_ref>c3</vm_group_ref>
</placement>
The last step is to mark the VM group itself for deletion by adding attribute nc:operation='delete' to the XML element
<vm_group nc:operation='delete'>
<name>c3</name>
<flavor>SFPCF101-DEPLOYMENT-control-function</flavor>
<bootup_time>1800</bootup_time>
<recovery_wait_time>1</recovery_wait_time>
...
To prepare the datamodel for adding the same VM group back, take the deletion datamodel, remove all the nc:operation='delete' everywhere.
Once the two datamodel files are ready, use the following command for service update:
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli edit-config {deleting datamodel file}
Wait until the service update is complete. Then add the VNF back:
Warning |
Service state must be ALIVE before you can perform adding the VNF back. If not, please recovery any VMs that is not in ALIVE state. |
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli edit-config {adding datamodel file}
For ESC Release 3.1 and later releases, the service may get stuck in the inert state when stop VM operation fails, and no recovery is triggered. One VM is an error, but the service is inert. You can use ENABLE MONITOR to get the VM and service back to alive or error.
/opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli vm-action ENABLE_MONITOR {VM Name}
This operation can enable the monitoring of the VM. If the VM is running in the VIM, the VM ALIVE event should be back to the VM state machine. The VM finally transits to an alive state. If the VM is not running in the VIM, the timer expires, the recovery procedure can bring the VM back. Meanwhile, the service transits to an active or error state.
Problem
When doing VNF VM manual recovery, the request gets rejected due to worng service status (ESC 3.1 and later releases) as:
$ /opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO vm-name
Recovery VM Action
/opt/cisco/esc/confd/bin/netconf-console --port=830 --host=127.0.0.1 --user=admin --privKeyFile=/home/admin/.ssh/confd_id_dsa --privKeyType=dsa --rpc=/tmp/esc_nc_cli.L1WdqyIE7r
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
<rpc-error>
<error-type>application</error-type>
<error-tag>operation-failed</error-tag>
<error-severity>error</error-severity>
<error-path xmlns:esc="http://www.cisco.com/esc/esc" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
/nc:rpc/esc:recoveryVmAction
</error-path>
<error-message xml:lang="en">Exception from action callback: Recovery VM Operation: recovery_do is not applicable since the service is in [SERVICE_INERT_STATE] state.</error-message>
<error-info>
<bad-element>recoveryVmAction</bad-element>
</error-info>
</rpc-error>
</rpc-reply>
Solution
At this point, check the opdata to find out the service state and VM state.
$ /opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli get esc_datamodel/opdata
<state_machine>
<state>SERVICE_INERT_STATE</state>
<vm_state_machines>
<vm_state_machine>
<vm_name>depz_g1_0_b6d19896-bc3b-400a-ad50-6d84c522067d</vm_name>
<state>VM_MONITOR_UNSET_STATE</state>
</vm_state_machine>
<vm_state_machine>
<vm_name>depz_g1_1_f8445a8a-29ba-457d-9224-c46eeaa97f72</vm_name>
<state>VM_ALIVE_STATE</state>
</vm_state_machine>
</vm_state_machines>
</state_machine>
Enable monitor for the VM in the unset monitor state:
$ /opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli vm-action ENABLE_MONITOR vm-name
After sometime, check the opdata again. The service should transit to active state or error state:
$ /opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli get esc_datamodel/opdata
<state_machine>
<state>SERVICE_ACTIVE_STATE</state>
<vm_state_machines>
<vm_state_machine>
<vm_name>depz_g1_0_b6d19896-bc3b-400a-ad50-6d84c522067d</vm_name>
<state>VM_ALIVE_STATE</state>
</vm_state_machine>
<vm_state_machine>
<vm_name>depz_g1_1_f8445a8a-29ba-457d-9224-c46eeaa97f72</vm_name>
<state>VM_ALIVE_STATE</state>
</vm_state_machine>
</vm_state_machines>
</state_machine>
Now do a manual recovery of the VM in error state:
$ /opt/cisco/esc/esc-confd/esc-cli/esc_nc_cli recovery-vm-action DO vm-name
Problem
When VNF operation (for example, deploy, recovery) gets denied with the following reason:
Default VIM Connector is not set up, or is unreachable. Please check your VIM Connector credentials and VIM status.
Description
If ESC was setup for mulit-vim, ensure that at least one VIM connector is marked as "default". Otherwise, check the ESC VIM connector status.
[admin@leishi-test ~]$ escadm vim show
[
{
"status": "CONNECTION_SUCCESSFUL",
"status_message": "Successfully connected to VIM",
"type": "OPENSTACK",
"id": "default_openstack_vim",
"properties": {
"property": [
{
"name": "os_project_domain_name",
"value": "default"
},
{
"name": "os_auth_url",
"value": "http://10.85.103.143:35357/v3"
},
{
"name": "os_project_name",
"value": "admin"
}
]
}
}
]
{
"user": [
{
"credentials": {
"properties": {
"property": [
{
"name": "os_password",
"value": "cisco123"
},
{
"name": "os_user_domain_name",
"value": "default"
}
]
}
},
"vim_id": "default_openstack_vim",
"id": "admin"
}
]
}
Solution
If there is no VIM connector returned, add one. If one VIM connector is returned but the status is not "CONNECTION_SUCCESSFUL", check the /var/log/esc/vimmanager/vimmanager.log for the following entry:
2017-12-07 23:11:49.760 [http-nio-127.0.0.1-8095-exec-5] INFO c.c.e.v.c.VimConnectionManagerService - Registering an user.
If there is an exception or error after the entry, it indicates the root cause. For example, if there is an error related to SSL, it means the certificate is missing or wrong.
2017-12-07 23:11:49.818 [http-nio-127.0.0.1-8095-exec-5] ERROR c.c.e.v.p.i.o.OpenStackProvider - Failed to register a user
org.openstack4j.api.exceptions.ConnectionException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at org.openstack4j.connectors.jersey2.HttpExecutorServiceImpl.invoke(HttpExecutorServiceImpl.java:58)
If there is a connection timeout or host name cannot reach exception, try to make a CURL get call to the AuthUrl provided, ensure that the OpenStack is reachable from the ESC VM.
curl -k https://www.xxxxx.com:5000/
If there is no error nor exception after the "Register an user" entry, it means that the authentication information provided is not correct. In this case, check the /var/log/esc/vimmanager/vim_vimmanager.log. Look at the beginning of the log file where the inital authentication happens:
2017-12-07 23:11:49.748 INFO os - 1 * Sending client request on thread http-nio-127.0.0.1-8095-exec-4
1 > POST https://10.85.103.49:35357/v3/auth/tokens
1 > Accept: application/json
1 > Content-Type: application/json
1 > OS4J-Auth-Command: Tokens
{
"auth" : {
"identity" : {
"password" : {
"user" : {
"name" : "admin",
"domain" : {
"name" : "default"
},
"password" : "****"
}
},
"methods" : [ "password" ]
},
"scope" : {
"project" : {
"name" : "admin",
"domain" : {
"name" : "default"
}
}
}
}
}
Double check the authUrl, user, project/tenant. For V3 authentication ensure that the authUrl is the actual V3 endpoint, otherwise a 404 is returned. Also for V3 authentication, ensure that the user domain and project domain are provided. If you use an openrc file from Horizon to boot ESC VM, and the openrc does not contain the project domain or user domain, declare explicitly:
OS_PROJECT_DOMAIN_NAME=default
OS_USER_DOMAIN_NAME=default
To verify if ESC gets the correct password for the default vim connector with bootvm, do the following:
admin@leishi-test ~]$ sudo escadm reload
[sudo] password for admin:
[admin@leishi-test ~]$ cat /opt/cisco/esc/esc-config/esc_params.conf
openstack.os_auth_url= http://10.85.103.153:35357/v3
openstack.os_project_name= admin
openstack.os_tenant_name= admin
openstack.os_user_domain_name= default
openstack.os_project_domain_name= default
openstack.os_identity_api_version= 3
openstack.os_username = admin
openstack.os_password = cisco123