- Preface
- Overview of High Availability
- Configure High Availability
- Configure High Availability for Cisco Catalyst 8000V Running on Azure
- Configure High Availability on Cisco Catalyst 8000V Running on Amazon Web Services
- Configure High Availability in Cisco Catalyst 8000V Running On Google Cloud Platform
- Example Configurations
- Verify High Availability
- Troubleshoot High Availability Issues
Troubleshoot High Availability Issues
Open the event file that is generated. This file is a debug log of the attempt to read and update the route described by the redundancy node. If the HA setup works as expected, the configuration output displays the status Event handling completed. If the system does not display this status, examine the log file in detail to determine which step of the verification failed.
Some of the common causes for failure include:
-
Inability to obtain authentication credentials.
-
The guestshell does not have network access.
-
The authentication service is not running in Guestshell.
-
The credentials for Cisco Catalyst 8000V are missing or incorrect.
-
The router cannot access the route table entry.
-
The route table was not correctly identified in the redundancy node
-
The router was not granted permission to access the route table
-
The specific route specified in the redundancy node does not exist
Note |
Cisco recommends that you use the |
Example: Troubleshooting Issues for High Availability
Execute the following command: router#show iox
. See the following examples that provide the possible issues and how you can check and resolve these issues:
Router#show iox
IOx Infrastructure Summary:
---------------------------
IOx service (CAF) : Running
IOx service (HA) : Not Supported
IOx service (IOxman) : Running
Libvirtd : Running
Router#guestshell enable
Router#show app-hosting list
App id State
------------------------------------------------------
guestshell RUNNING
Router#guestshell
[guestshell@guestshell ~]$
[guestshell@guestshell ~]$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=38 time=25.7 ms
Possible Cause:
The configuration of IOX and the creation of the VirtualPortGroup interface to provide the guestshell network access is part of the "day zero" configuration of the C8000V. If any of the above steps did not work, check that the startup configuration of the C8000V has been altered.
How to Fix:
A reload of the C8000V will re-apply the day zero configuration.
---------
Problem:
HA package installation failure
How to Check:
Router#guestshell
Router#guestshell
[guestshell@guestshell ~]$ ls
cloud
[guestshell@guestshell ~]$ cd cloud
[guestshell@guestshell cloud]$ ls
HA
You should see the directory ~/cloud/HA.
On an Azure provided cloud, you should also see a ~/cloud/authMgr directory.
Possible Cause:
The HA package was not installed, or was not installed using the --user option.
How to Fix:
Install the package and set up the environment:
pip3 install c8000v_<provider>_ha --user
source ~/.bashrc
---------
Problem:
HA server not running.
How to Check:
[guestshell@guestshell ~]$ systemctl status csr_ha
● csr_ha.service - C8000V High Availability service
Loaded: loaded (/etc/systemd/user/csr_ha.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2019-04-08 15:01:51 UTC; 2h 1min ago
Main PID: 286 (python)
CGroup: /system.slice/libvirtd.service/system.slice/csr_ha.service
├─286 python /home/guestshell/.local/lib/python2.7/site-packages/c...
└─295 python /home/guestshell/.local/lib/python2.7/site-packages/c...
On an Azure provided network, the auth-token service should also be running.
[guestshell@guestshell ~]$ systemctl status csr_ha
● csr_ha.service - C8000V High Availability service
Loaded: loaded (/etc/systemd/user/csr_ha.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2019-04-08 15:01:51 UTC; 2h 1min ago
Main PID: 286 (python)
CGroup: /system.slice/libvirtd.service/system.slice/csr_ha.service
├─286 python /home/guestshell/.local/lib/python2.7/site-packages/c...
└─295 python /home/guestshell/.local/lib/python2.7/site-packages/c...
[guestshell@guestshell ~]$ systemctl status auth-token
● auth-token.service - Authentication Token service
Loaded: loaded (/etc/systemd/user/auth-token.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2019-04-08 16:08:15 UTC; 57min ago
Main PID: 542 (python)
CGroup: /system.slice/libvirtd.service/system.slice/auth-token.service
└─542 /usr/bin/python /home/guestshell/.local/lib/python2.7/site-p...
Possible Cause:
If the HA server has an error and crashes, it is automatically restarted.
How to Fix:
A service can be restarted manually
[guestshell@guestshell ~]$ sudo systemctl start csr_ha
---------
Problem:
C8000V authentication not working on Azure.
This is an Azure specific error.
How to check:
If you perform a node_event on a redundancy node, and it fails while trying to read the route table, it will generate a file ~/cloud/HA/events/routeTableGetRsp.
[guestshell@guestshell ~]$ cat routeTableGetRsp
{"error":{"code":"AuthenticationFailedMissingToken","message":"Authentication failed. The 'Authorization' header is missing the access token."}}
Possible Cause:
There are multiple possible causes. And it depends upon the authentication mechanism you are using:
- System assigned managed identity
- Registered application in Azure Active Directory (AAD)
Likely cause of a failure using system assigned managed identity is that it is not enabled on C8000V.
How to Fix:
Verify the C8000V is enabled for system assigned managed identity.
In the Azure portal, navigate to the virtual machine running the C8000V.
Under the Settings menu, select the Identity item.
Under the system assigned tab, verify the status is set to On.
When using AAD for authentication, the likely cause of the error is a mis-configuration of the application or a mis-match in the identifiers for the application configured in the guestshell.
How to Fix:
The application in AAD must be given the proper permissions to read and write a route table.
In the Azure portal, navigate to the registered application you have created.
Under the API Access menu, select the Required permissions item.
Select the Windows Azure Active Directory API. In the Enable Access pane, verify the following permissions are set:
- Application permission to read and write directory data
- Delegated permission to sign in and read user profile
Select the Windows Azure Service Management API. In the Enable Access pane, verify the following permissions are set:
- Delegated permission to access Azure service management as organization users
How to Fix:
In the Azure portal, navigate to the registered application you have created.
Select the Setting button for the application.
Verify the application_id, tenant_id, and application key in the portal match the values configured in guestshell. Verify the application key configured in guestshell is in URL unencoded format.
-------------
Problem:
Route table entry not updated by a peer failure event.
How to Check:
For every node event a log file is generated in the directory ~/cloud/HA/events.
This file will indicate the event that was processed and its result. Examine this file for possible errors. It is likely in the case of an error that a file ~/cloud/HA/events/routeTableGetRsp is also written. Also examine this file for additional insights.
Possible Causes:
A route was not correctly identified in a redundancy node. Depending upon what parameter in the redundancy node is in error, you may see different results.
Some examples:
[guestshell@guestshell events]$ cat routeTableGetRsp
{"error":{"code":"SubscriptionNotFound","message":"The subscription 'b0b1a9e2-444c-4ca5-acd9-bebd1e6874ef' could not be found."}}
This implies the Azure subscription ID was not entered correctly.
[guestshell@guestshell events]$ cat node*
Route GET request failed with code 403
Route table get response:
{"error":{"code":"AuthorizationFailed","message":"The client 'b3ce41c0-bcef-41d7-9741-26bea31221c1' with object id 'b3ce41c0-bcef-41d7-9741-26bea31221c1' does not have authorization to perform action 'Microsoft.Network/routeTables/read' over scope '/subscriptions/b0b1a9e2-444c-4ca5-acd9-bebd1e6873eb/resourceGroups/gsday0-rg/providers/Microsoft.Network/routeTables/gsday0-sub4-RouteTable'."}}
Route table not found.
This implies the name of the route table was incorrect or does not exist.
[guestshell@guestshell events]$ cat node*
Did not find route 17.0.0.0/8 event type peerFail
This implies that the route does not exist.
How to Fix:
Make sure the identifiers in the redundancy node match the values in the cloud provider's portal.
------------
Problem:
Route table entry not updated by a peer failure event.
How to Check:
For every node event a log file is generated in the directory ~/cloud/HA/events.
This file will indicate the event that was processed and its result. Examine this file for possible errors. It is likely in the case of an error that a file ~/cloud/HA/events/routeTableGetRsp is also written. Also examine this file for additional insights.
Possible Causes:
The C8000V has not been given permission to access the route table.
Fetching the route table
Route table get response:
{"error":{"code":"AuthorizationFailed","message":"The client 'b3ce41c0-bcef-41d7-9741-26bea31221c1' with object id 'b3ce41c0-bcef-41d7-9741-26bea31221c1' does not have authorization to perform action 'Microsoft.Network/routeTables/read' over scope '/subscriptions/b0b1a9e2-444c-4ca5-acd9-bebd1e6873eb/resourceGroups/gsday0-rg/providers/Microsoft.Network/routeTables/gsday0-sub2-RouteTable'."}}
Route GET request failed with code 403
Route table get response:
{"error":{"code":"AuthorizationFailed","message":"The client 'b3ce41c0-bcef-41d7-9741-26bea31221c1' with object id 'b3ce41c0-bcef-41d7-9741-26bea31221c1' does not have authorization to perform action 'Microsoft.Network/routeTables/read' over scope '/subscriptions/b0b1a9e2-444c-4ca5-acd9-bebd1e6873eb/resourceGroups/gsday0-rg/providers/Microsoft.Network/routeTables/gsday0-sub2-RouteTable'."}}
Route table not found.
C8000V HA: Set route table for verify
Route Table not found
If none of these troubleshooting tips have resolved your problem, run this command:
[guestshell@guestshell ~]$ cd ~/cloud/HA
[guestshell@guestshell ~]$ bash debug_ha.sh
[guestshell@guestshell ~]$ ls /bootflash
You should see a file name ha_debug.tar. Copy this file off the C8000V and provide it to Cisco Technical Support for analysis.