ハイアベイラビリティに関する問題のトラブルシューティング

生成されたイベントファイルを開きます。このファイルは、冗長ノードによって記述されたルートの読み取りと更新の試行のデバッグログです。HA セットアップが期待どおりに機能している場合、設定出力にステータス「Event Handling completed」が表示されます。システムにこのステータスが表示されない場合は、ログファイルを詳しく調べて、検証のどの手順が失敗したかを判断します。

失敗の一般的な原因は次のとおりです。

  • 認証情報を取得できません。

  • ゲストシェルにはネットワークアクセスがありません。

  • 認証サービスがゲストシェルで実行されていません。

  • Cisco Catalyst 8000V のログイン情報がないか、正しくありません。

  • ルータはルートテーブルエントリにアクセスできません。

  • 冗長ノードでルートテーブルが正しく識別されませんでした

  • ルータにルートテーブルへのアクセス権限が付与されていませんでした

  • 冗長ノードで指定された特定のルートが存在しません


(注)  


検証イベントで node_event スクリプトを使用して、冗長ノードの設定と動作をテストすることを推奨します。


例:ハイアベイラビリティのトラブルシューティングの問題

router#show iox コマンドを実行します。考えられる問題と、それらの問題を確認して解決する方法を示す次の例を参照してください。

Router#show iox

IOx Infrastructure Summary:
---------------------------
IOx service (CAF)    : Running 
IOx service (HA)     : Not Supported 
IOx service (IOxman) : Running 
Libvirtd             : Running 

Router#guestshell enable

Router#show app-hosting list
App id                           State
------------------------------------------------------
guestshell                       RUNNING

Router#guestshell 
[guestshell@guestshell ~]$ 

[guestshell@guestshell ~]$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=38 time=25.7 ms

Possible Cause:
The configuration of IOX and the creation of the VirtualPortGroup interface to provide the guestshell network access is part of the "day zero" configuration of the C8000V.  If any of the above steps did not work, check that the startup configuration of the C8000V has been altered.

How to Fix:
A reload of the C8000V will re-apply the day zero configuration.

---------

Problem:
HA package installation failure

How to Check:
Router#guestshell 
Router#guestshell
[guestshell@guestshell ~]$ ls
cloud
[guestshell@guestshell ~]$ cd cloud
[guestshell@guestshell cloud]$ ls
HA

You should see the directory ~/cloud/HA.  
On an Azure provided cloud, you should also see a ~/cloud/authMgr directory.

Possible Cause:
The HA package was not installed, or was not installed using the --user option. 

How to Fix:
Install the package and set up the environment:
	pip install c8000v_<provider>_ha --user
	source ~/.bashrc

---------

Problem:
HA server not running.

How to Check:
[guestshell@guestshell ~]$ systemctl status c8000v_ha
● c8000v_ha.service - C8000V High Availability service
   Loaded: loaded (/etc/systemd/user/c8000v_ha.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-04-08 15:01:51 UTC; 2h 1min ago
 Main PID: 286 (python)
   CGroup: /system.slice/libvirtd.service/system.slice/c8000v_ha.service
           ├─286 python /home/guestshell/.local/lib/python2.7/site-packages/c...
           └─295 python /home/guestshell/.local/lib/python2.7/site-packages/c...

On an Azure provided network, the auth-token service should also be running.
[guestshell@guestshell ~]$ systemctl status c8000v_ha
● c8000v_ha.service - C8000V High Availability service
   Loaded: loaded (/etc/systemd/user/c8000v_ha.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-04-08 15:01:51 UTC; 2h 1min ago
 Main PID: 286 (python)
   CGroup: /system.slice/libvirtd.service/system.slice/c8000v_ha.service
           ├─286 python /home/guestshell/.local/lib/python2.7/site-packages/c...
           └─295 python /home/guestshell/.local/lib/python2.7/site-packages/c...
[guestshell@guestshell ~]$ systemctl status auth-token
● auth-token.service - Authentication Token service
   Loaded: loaded (/etc/systemd/user/auth-token.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-04-08 16:08:15 UTC; 57min ago
 Main PID: 542 (python)
   CGroup: /system.slice/libvirtd.service/system.slice/auth-token.service
           └─542 /usr/bin/python /home/guestshell/.local/lib/python2.7/site-p...

Possible Cause:
If the HA server has an error and crashes, it is automatically restarted.

How to Fix:
A service can be restarted manually
[guestshell@guestshell ~]$ sudo systemctl start c8000v_ha


---------

Problem:
C8000V authentication not working on Azure.
This is an Azure specific error.

How to check:
If you perform a node_event on a redundancy node, and it fails while trying to read the route table, it will generate a file ~/cloud/HA/events/routeTableGetRsp.
[guestshell@guestshell ~]$ cat routeTableGetRsp 
{"error":{"code":"AuthenticationFailedMissingToken","message":"Authentication failed. The 'Authorization' header is missing the access token."}}

Possible Cause:
There are multiple possible causes.  And it depends upon the authentication mechanism you are using:
 - System assigned managed identity
 - Registered application in Azure Active Directory (AAD)

Likely cause of a failure using system assigned managed identity is that it is not enabled on C8000V.

How to Fix:
Verify the C8000V is enabled for system assigned managed identity.
In the Azure portal, navigate to the virtual machine running the C8000V.  
Under the Settings menu, select the Identity item.
Under the system assigned tab, verify the status is set to On.

When using AAD for authentication, the likely cause of the error is a mis-configuration of the application or a mis-match in the identifiers for the application configured in the guestshell.

How to Fix:
The application in AAD must be given the proper permissions to read and write a route table.
In the Azure portal, navigate to the registered application you have created.
Under the API Access menu, select the Required permissions item.  
Select the Windows Azure Active Directory API.  In the Enable Access pane, verify the following permissions are set:
	- Application permission to read and write directory data
	- Delegated permission to sign in and read user profile
Select the Windows Azure Service Management API.  In the Enable Access pane, verify the following permissions are set:
	- Delegated permission to access Azure service management as organization users

How to Fix:
In the Azure portal, navigate to the registered application you have created.
Select the Setting button for the application.
Verify the application_id, tenant_id, and application key in the portal match the values configured in guestshell.  Verify the application key configured in guestshell is in URL unencoded format.

-------------

Problem:
Route table entry not updated by a peer failure event.

How to Check:
For every node event a log file is generated in the directory ~/cloud/HA/events.
This file will indicate the event that was processed and its result.  Examine this file for possible errors.  It is likely in the case of an error that a file ~/cloud/HA/events/routeTableGetRsp is also written.  Also examine this file for additional insights.

Possible Causes:
A route was not correctly identified in a redundancy node.  Depending upon what parameter in the redundancy node is in error, you may see different results.

Some examples:
[guestshell@guestshell events]$ cat routeTableGetRsp 
{"error":{"code":"SubscriptionNotFound","message":"The subscription 'b0b1a9e2-444c-4ca5-acd9-bebd1e6874ef' could not be found."}}
This implies the Azure subscription ID was not entered correctly.

[guestshell@guestshell events]$ cat node*
Route GET request failed with code 403
Route table get response:
 {"error":{"code":"AuthorizationFailed","message":"The client 'b3ce41c0-bcef-41d7-9741-26bea31221c1' with object id 'b3ce41c0-bcef-41d7-9741-26bea31221c1' does not have authorization to perform action 'Microsoft.Network/routeTables/read' over scope '/subscriptions/b0b1a9e2-444c-4ca5-acd9-bebd1e6873eb/resourceGroups/gsday0-rg/providers/Microsoft.Network/routeTables/gsday0-sub4-RouteTable'."}}
Route table not found.
This implies the name of the route table was incorrect or does not exist.

[guestshell@guestshell events]$ cat node*
Did not find route 17.0.0.0/8 event type peerFail
This implies that the route does not exist.

How to Fix:
Make sure the identifiers in the redundancy node match the values in the cloud provider's portal.

------------

Problem:
Route table entry not updated by a peer failure event.

How to Check:
For every node event a log file is generated in the directory ~/cloud/HA/events.
This file will indicate the event that was processed and its result.  Examine this file for possible errors.  It is likely in the case of an error that a file ~/cloud/HA/events/routeTableGetRsp is also written.  Also examine this file for additional insights.

Possible Causes:
The C8000V has not been given permission to access the route table.
Fetching the route table
Route table get response:
 {"error":{"code":"AuthorizationFailed","message":"The client 'b3ce41c0-bcef-41d7-9741-26bea31221c1' with object id 'b3ce41c0-bcef-41d7-9741-26bea31221c1' does not have authorization to perform action 'Microsoft.Network/routeTables/read' over scope '/subscriptions/b0b1a9e2-444c-4ca5-acd9-bebd1e6873eb/resourceGroups/gsday0-rg/providers/Microsoft.Network/routeTables/gsday0-sub2-RouteTable'."}}
Route GET request failed with code 403
Route table get response:
 {"error":{"code":"AuthorizationFailed","message":"The client 'b3ce41c0-bcef-41d7-9741-26bea31221c1' with object id 'b3ce41c0-bcef-41d7-9741-26bea31221c1' does not have authorization to perform action 'Microsoft.Network/routeTables/read' over scope '/subscriptions/b0b1a9e2-444c-4ca5-acd9-bebd1e6873eb/resourceGroups/gsday0-rg/providers/Microsoft.Network/routeTables/gsday0-sub2-RouteTable'."}}
Route table not found.
C8000V HA: Set route table for verify
Route Table not found


If none of these troubleshooting tips have resolved your problem, run this command:
[guestshell@guestshell ~]$ cd ~/cloud/HA
[guestshell@guestshell ~]$ bash debug_ha.sh
[guestshell@guestshell ~]$ ls /bootflash
You should see a file name ha_debug.tar.  Copy this file off the C8000V and provide it to Cisco Technical Support for analysis.