Introduction
This document describe how to troubleshoot CloudCenter error "Unable to communicate with orchestrator" with error 408
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Linux Interface
- Virtual Machine Environment
- VIM
Components Used
Cisco recommends knowledge in:
- CloudCenter appliances
- CloudCenter architecture
- Linux O.S.
- CCM (CloudCenter Management)
- CCO (CloudCenter Orchestrator)
- AMQP (Advanced Message Queuing Protocol)
The information in this document was created from the devices in a specific private lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Problem
Power outages, unexpected restarts or network failures for long periods of time might cause CloudCenter appliance to desync. It is necessary to perform the following checks in order to know that appliances are correctly connected When configuring the orchestrator on CloudCenter Manager Graphical User Interface (CCM GUI) users might get the error as shown in the image.
When checking the CCO logs, the following error can be shown:
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:337)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
... 87 more
java.lang.RuntimeException: Failed to connect to CCM, please check network connection between CCM and CCO. JobId: 21912
at com.osmosix.commons.mgmtserver.impl.MgmtServerServiceImpl.getUserCloudAccountByJobId(MgmtServerServiceImpl.java:236)
at com.osmosix.gateway.persistence.impl.hazelcast.AbstractDistributedJobDaoImpl.find(AbstractDistributedJobDaoImpl.java:109)
at com.osmosix.gateway.persistence.impl.hazelcast.AbstractDistributedJobDaoImpl.find(AbstractDistributedJobDaoImpl.java:17)
at com.osmosix.gateway.lifecycle.impl.AbstractLifecycle.getJob(AbstractLifecycle.java:207)
at com.osmosix.gateway.lifecycle.helpers.LifecycleReaper.reapApp(LifecycleReaper.java:62)
at com.osmosix.gateway.lifecycle.helpers.LifecycleReaper.reapDeadApps(LifecycleReaper.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Solution
Is necessary restart the CloudCenter components one by one to refresh the handshake between the components
AMQP
Step 1. Log in as root
Step 2. Restart AMQP service
On all versions up to 4.8.1.2
# /etc/init.d/tomcatgua restart
On versions starting from 4.8.2
# systemctl restart rabbit
CCO
Step 1. Log in as root
Step 2. Restart CCO service
On all versions up to 4.8.1.2
# /etc/init.d/tomcat restart
On versions starting from 4.8.2
# systemctl restart cco
CCM
Step 1. Log in as root
Step 2. Restart CCM service
On all versions up to 4.8.1.2
# /etc/init.d/tomcat restart
On versions starting from 4.8.2
# systemctl restart ccm
Verify
It is important that all appliances are correctly connected, for this it is necessary to check in each one of the CloudCenter components.
CCM
Step 1. Log in as root
Step 2. Check that tomcat (previous to 4.8.2) or CCM service (post 4.8.2) are actually running
On all versions up to 4.8.1.2
[root@localhost ~]# ps -ef | grep -i tomcat
On versions starting from 4.8.2
[root@localhost ~]# systemctl status ccm
Step 3. If telnet is installed, an attempt can be made from CCO towards CCM, this will allow to understand that communication is possible
[root@cliqr-centos7-base-image ~]# telnet 10.31.127.41 8443
Trying 10.31.127.41...
Connected to 10.31.127.41.
Escape character is '^]'.
If an error occurs, then no communication is possible. This has to be fixed.
Step 4. If to configure orchestrator on CCM GUI a hostname will be used, be sure that the hostname is present in /etc/hosts file
[root@cliqr-centos7-base-image ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1 devCC
10.31.127.42 CCO
AMQP
Step 1. Log in as root
Step 2. Check that there is a connection established from the AMQP to each one of the existing CCOs.
[root@localhost ~]# rabbitmqctl list_connections -p /cliqr
Listing connections ...
cliqr 10.31.127.42 33062 running
cliqr_worker 10.31.127.42 33130 running
cliqr_worker 10.31.127.59 38596 running
cliqr_worker 10.31.127.67 49781 running
cliqr_worker 10.31.127.79 49778 running
cliqr_worker 10.31.127.85 49786 running
In the previous command, the connections towards the CCO can be seen in the line with the cliqr user (in this case there is only one CCO)
In case of High Availability (HA) and AMQP under a load balancer then you'll see one connection per CCO connected with the AMQP's Load Balancer IP (In the following example there are 2 CCOs)
[root@amqp-azre1 ~]# rabbitmqctl list_connections -p /cliqr
Listing connections ...
cliqr 15.1.0.10 35788 running
cliqr 15.1.0.10 36212 running
cliqr_worker 15.1.0.10 37714 running
cliqr_worker 15.1.0.10 38362 running
cliqr_worker 15.1.0.10 41102 running
If this is not the case, restart the tomcatgua process (previous to 4.8.2) or the rabbit service (post 4.8.2)
CCO
Step 1. Log in as root
On all versions up to 4.8.1.2
[root@localhost ~]# ps -ef | grep -i tomcat
On versions starting from 4.8.2
[root@localhost ~]# systemctl status cco
Step 3. Check that connections towards the CCM are established. It should appear in CLOSE_WAIT status as well (In this case our CCM is on 10.31.127.41)
[root@cliqr-centos7-base-image ~]# netstat -anp | grep 10.31.127.41
tcp 86 0 10.31.127.42:38542 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38562 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38546 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38566 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38556 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38554 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38550 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38564 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38560 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38568 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38552 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38558 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38570 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38548 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38572 10.31.127.41:8443 CLOSE_WAIT 1330/java
tcp 86 0 10.31.127.42:38544 10.31.127.41:8443 CLOSE_WAIT 1330/java