CNDPソリューションのサーバ問題のトラブルシューティング

ダウンロードオプション

PDF (167.1 KB)
Adobe Reader を使ってさまざまなデバイスで表示
ePub (80.1 KB)
iPhone、iPad、Android、ソニーの Reader、または Windows Phone 上で、さまざまなアプリを使って表示
Mobi (Kindle) (66.0 KB)
Kindle デバイスで、または Kindle アプリを使って複数のデバイスで表示

Updated: 2022 年 5 月 26 日

Document ID:217899

偏向のない言語

この製品のドキュメントセットは、偏向のない言語を使用するように配慮されています。このドキュメントセットでの偏向のない言語とは、年齢、障害、性別、人種的アイデンティティ、民族的アイデンティティ、性的指向、社会経済的地位、およびインターセクショナリティに基づく差別を意味しない言語として定義されています。製品ソフトウェアのユーザインターフェイスにハードコードされている言語、RFP のドキュメントに基づいて使用されている言語、または参照されているサードパーティ製品で使用されている言語によりドキュメントに例外が存在する場合があります。シスコのインクルーシブランゲージの取り組みの詳細は、こちらをご覧ください。

翻訳について

シスコは世界中のユーザにそれぞれの言語でサポートコンテンツを提供するために、機械と人による翻訳を組み合わせて、本ドキュメントを翻訳しています。ただし、最高度の機械翻訳であっても、専門家による翻訳のような正確性は確保されません。シスコは、これら翻訳の正確性について法的責任を負いません。原典である英語版（リンクからアクセス可能）もあわせて参照することを推奨します。

内容

概要

背景説明

問題

解決方法

コンテナの出力例

VMの出力例

UCSホストへのSSH

概要

このドキュメントでは、Unified Computing System(UCS)を特定し、Cloud Native Deployment Platform(CNDP)の障害エントリを確認する方法について説明します。

背景説明

ハードウェア関連のアラートは、Ultra Cloud Core Subscriber Microservices Infrastructure(SMI)Cluster Manager(CM)Common Execution Environment(CEE)で報告されます。 Kubernetes(K8s)、dockerなどの関連情報は、CM仮想IP(VIP)で報告されます。

注意：IPを確認するには、ネットワーク設計および顧客情報アンケート(CIQ)を参照してください。

問題

「Equipment Alarm」エラーがshow alertsで報告されます。

CM-CEEにログインし、show alerts active detailコマンドとshow alerts history summaryコマンドを実行して、すべてのアクティブおよび履歴アラートを表示します。
アラートでレポートされるサーバIPに注意してください。

[lab-deployer/labceec01] cee# show alerts active detail 
alerts active detail server-alert 9c367ce5ee48
 severity    major
 type        "Equipment Alarm"
 startsAt    2021-10-27T17:10:37.025Z
 source      10.10.10.10
 summary     "DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM"
 labels      [ "alertname: server-alert" "cluster: cr-chr-deployer" "description: DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM" "fault_id: sys/rack-unit-1/board/memarray-1/mem-5/fault-F0185" "id: 134219020" "monitor: prometheus" "replica: cr-chr-deployer" "server: 10.10.10.10" "severity: major" ]
 annotations [ "dn: cr-chr-deployer/10.10.10.10/sys/rack-unit-1/board/memarray-1/mem-5/fault-F0185/134219020" "summary: DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM" "type: Equipment Alarm" ]

[lab-deployer/labceec01] cee# show alerts history summary
NAME      UID           SEVERITY  STARTS AT       DURATION  SOURCE       SUMMARY            
---------------------------------------------------------------------------------------------
vm-alive  f6a65030b593  minor     09-02T10:28:28  1m40s     10-192-0-13  labd0123 is alive. 
vm-error  3a6d840e3eda  major     09-02T10:27:18  1m        10-192-0-13  labd0123 is down.  
vm-alive  49b2c1941dc6  minor     09-02T10:25:38  1m40s     10-192-0-14  labd0123 is alive.

解決方法

SMI CMのサーバ上でホストされているサービス（コンテナ）や仮想マシン(VM)やカーネルベースの仮想マシン(KVM)を特定し、show running-configコマンドを実行してサーバIPの設定を確認します。

CM VIP(ユーザ名：cloud-user)
OPS Centerからsmi-cmネームスペースのIPを取得する
OPSセンターにログインし、クラスタ構成を確認します
サーバ上で実行されているノードとVMを特定する

cloud-user@lab-deployer-cm-primary:~$ kubectl get svc -n smi-cm
NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP      PORT(S)                                                 AGE
cluster-files-offline-smi-cluster-deployer    ClusterIP   10.102.200.178   <none>           8080/TCP                                                98d
iso-host-cluster-files-smi-cluster-deployer   ClusterIP   10.102.100.208     192.168.1.102    80/TCP                                                  98d
iso-host-ops-center-smi-cluster-deployer      ClusterIP   10.102.200.73    192.168.1.102    3001/TCP                                                98d
netconf-ops-center-smi-cluster-deployer       ClusterIP   10.102.100.207   192.168.184.193   3022/TCP,22/TCP                                         98d
ops-center-smi-cluster-deployer               ClusterIP   10.10.20.20     <none>           8008/TCP,2024/TCP,2022/TCP,7681/TCP,3000/TCP,3001/TCP   98d
squid-proxy-node-port                         NodePort    10.102.60.114    <none>           3128:32261/TCP                                          98d

cloud-user@lab-deployer-cm-primary:~$ ssh -p 2024 admin@10.10.20.20
admin@10.10.20.20's password:
      Welcome to the Cisco SMI Cluster Deployer on lab-deployer-cm-primary
      Copyright © 2016-2020, Cisco Systems, Inc.
      All rights reserved.
admin connected from 192.168.1.100 using ssh on ops-center-smi-cluster-deployer-7848c69844-xzdw6
[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters

コンテナの出力例

この例では、サーバはnode primary-1によって使用されます。

[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters lab01-smf nodes primary-1
clusters lab01-smf
nodes primary-1
  maintenance false
  k8s node-type       primary
  k8s ssh-ip          10.192.10.22
  k8s sshd-bind-to-ssh-ip true
  k8s node-ip         10.192.10.22
  k8s node-labels smi.cisco.com/node-type oam
  exit
  k8s node-labels smi.cisco.com/node-type-1 proto
  exit
  ucs-server cimc user admin
  ucs-server cimc ip-address 10.10.10.10

VMの出力例

サーバは、KVMベースのVMに使用できます。

この例では、サーバにユーザプレーン機能(UPF)（upf1およびupf2）があります。

[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters lab01-upf nodes labupf
clusters lab01-upf
nodes labupf
  maintenance false
  ssh-ip      10.192.30.7
  type        kvm
  vms upf1
   upf software lab...
...
   type upf
  exit
  vms upf2
   upf software lab...
...
   type upf
  exit
  ucs-server cimc user admin
...
  ucs-server cimc ip-address 10.10.10.10
...
  exit

UCSホストへのSSH

UCSホストに接続し、スコープ障害、show fault entries、およびshow fault historyで障害エントリを確認します。

labucs111-cmp1-11 /fault # show fault-entries 
Time Severity Description ------------------------- ------------- --------------------------------------- 
2021-03-26T10:10:10 major "DDR4_P1_C1_ECC: DIMM 19 is inoperable : Check or replace DIMM"

LABCP0222-Server22-02 /fault # show fault-history
Time                Severity      Source          Cause                     Description                             
------------------- ------------- --------------- ------------------------- ----------------------------------------
2021 Dec 10 02:02:02 UTC info          %CIMC           EQUIPMENT_INOPERABLE      "[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared "
2021 Dec 1 01:01:01 UTC critical      %CIMC           EQUIPMENT_INOPERABLE      "[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status. "

更新履歴

改定	発行日	コメント
1.0	26-May-2022	初版

シスコエンジニア提供

Cinthia Janneth Martinez
Cisco TACエンジニア
Nebojsa Kosanovic
Cisco TACエンジニア

このドキュメントは役に立ちましたか?

フィードバック

シスコに問い合わせ

サポートケースをオープン
(シスコサービス契約が必要です。)

CNDPソリューションのサーバ問題のトラブルシューティング

ダウンロード オプション

偏向のない言語

翻訳について

内容

概要

背景説明

問題

解決方法

コンテナの出力例

VMの出力例

UCSホストへのSSH

更新履歴

シスコ エンジニア提供

このドキュメントは役に立ちましたか?

シスコに問い合わせ

このドキュメントは次の製品に対応しています

ダウンロードオプション

シスコエンジニア提供