排除CNDP解决方案中的服务器问题

下载选项

PDF (173.9 KB)
在各种设备上使用 Adobe Reader 查看
ePub (79.0 KB)
在 iPhone、iPad、Android、Sony Reader 或 Windows Phone 上使用各种应用查看
Mobi (Kindle) (64.3 KB)
在 Kindle 设备上查看或在多个设备上使用 Kindle 应用查看

已更新: 2022 年 5 月 26 日

文档 ID:217899

非歧视性语言

此产品的文档集力求使用非歧视性语言。在本文档集中，非歧视性语言是指不隐含针对年龄、残障、性别、种族身份、族群身份、性取向、社会经济地位和交叉性的歧视的语言。由于产品软件的用户界面中使用的硬编码语言、基于 RFP 文档使用的语言或引用的第三方产品使用的语言，文档中可能无法确保完全使用非歧视性语言。深入了解思科如何使用包容性语言。

关于此翻译

思科采用人工翻译与机器翻译相结合的方式将此文档翻译成不同语言，希望全球的用户都能通过各自的语言得到支持性的内容。请注意：即使是最好的机器翻译，其准确度也不及专业翻译人员的水平。 Cisco Systems, Inc. 对于翻译的准确性不承担任何责任，并建议您总是参考英文原始文档（已提供链接）。

简介

本文档介绍如何在云本地部署平台(CNDP)中识别统一计算系统(UCS)并检查其上的故障条目。

背景信息

与硬件相关的警报在Ultra Cloud Core Subscriber Microservices Infrastructure(SMI)Cluster Manager(CM)Common Execution Environment(CEE)中报告。 CM虚拟IP(VIP)中报告Kubernetes(K8s)、docker等相关信息。

警告：请参阅网络设计和客户信息调查表(CIQ)以验证IP。

问题

show alerts中报告了错误“Equipment Alarm”。

登录CM-CEE，运行命令show alerts active detail和show alerts history summary，以便显示所有活动和历史记录警报。
注意警报中报告的服务器IP。

[lab-deployer/labceec01] cee# show alerts active detail 
alerts active detail server-alert 9c367ce5ee48
 severity    major
 type        "Equipment Alarm"
 startsAt    2021-10-27T17:10:37.025Z
 source      10.10.10.10
 summary     "DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM"
 labels      [ "alertname: server-alert" "cluster: cr-chr-deployer" "description: DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM" "fault_id: sys/rack-unit-1/board/memarray-1/mem-5/fault-F0185" "id: 134219020" "monitor: prometheus" "replica: cr-chr-deployer" "server: 10.10.10.10" "severity: major" ]
 annotations [ "dn: cr-chr-deployer/10.10.10.10/sys/rack-unit-1/board/memarray-1/mem-5/fault-F0185/134219020" "summary: DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM" "type: Equipment Alarm" ]

[lab-deployer/labceec01] cee# show alerts history summary
NAME      UID           SEVERITY  STARTS AT       DURATION  SOURCE       SUMMARY            
---------------------------------------------------------------------------------------------
vm-alive  f6a65030b593  minor     09-02T10:28:28  1m40s     10-192-0-13  labd0123 is alive. 
vm-error  3a6d840e3eda  major     09-02T10:27:18  1m        10-192-0-13  labd0123 is down.  
vm-alive  49b2c1941dc6  minor     09-02T10:25:38  1m40s     10-192-0-14  labd0123 is alive.

解决方案

确定托管在SMI CM服务器上的服务（容器）和/或虚拟机(VM)或基于内核的虚拟机(KVM)，运行命令show running-config并查找服务器IP的配置。

登录CM VIP(用户名：云用户)
从OPS中心获取smi-cm命名空间
登录到OPS中心，并检查集群配置
确定在服务器上运行的节点和虚拟机

cloud-user@lab-deployer-cm-primary:~$ kubectl get svc -n smi-cm
NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP      PORT(S)                                                 AGE
cluster-files-offline-smi-cluster-deployer    ClusterIP   10.102.200.178   <none>           8080/TCP                                                98d
iso-host-cluster-files-smi-cluster-deployer   ClusterIP   10.102.100.208     192.168.1.102    80/TCP                                                  98d
iso-host-ops-center-smi-cluster-deployer      ClusterIP   10.102.200.73    192.168.1.102    3001/TCP                                                98d
netconf-ops-center-smi-cluster-deployer       ClusterIP   10.102.100.207   192.168.184.193   3022/TCP,22/TCP                                         98d
ops-center-smi-cluster-deployer               ClusterIP   10.10.20.20     <none>           8008/TCP,2024/TCP,2022/TCP,7681/TCP,3000/TCP,3001/TCP   98d
squid-proxy-node-port                         NodePort    10.102.60.114    <none>           3128:32261/TCP                                          98d

cloud-user@lab-deployer-cm-primary:~$ ssh -p 2024 admin@10.10.20.20
admin@10.10.20.20's password:
      Welcome to the Cisco SMI Cluster Deployer on lab-deployer-cm-primary
      Copyright © 2016-2020, Cisco Systems, Inc.
      All rights reserved.
admin connected from 192.168.1.100 using ssh on ops-center-smi-cluster-deployer-7848c69844-xzdw6
[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters

容器输出示例

在本例中，节点primary-1使用服务器。

[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters lab01-smf nodes primary-1
clusters lab01-smf
nodes primary-1
  maintenance false
  k8s node-type       primary
  k8s ssh-ip          10.192.10.22
  k8s sshd-bind-to-ssh-ip true
  k8s node-ip         10.192.10.22
  k8s node-labels smi.cisco.com/node-type oam
  exit
  k8s node-labels smi.cisco.com/node-type-1 proto
  exit
  ucs-server cimc user admin
  ucs-server cimc ip-address 10.10.10.10

VM的输出示例

服务器可用于基于KVM的VM。

在本示例中，服务器具有用户平面功能(UPF)- upf1和upf2。

[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters lab01-upf nodes labupf
clusters lab01-upf
nodes labupf
  maintenance false
  ssh-ip      10.192.30.7
  type        kvm
  vms upf1
   upf software lab...
...
   type upf
  exit
  vms upf2
   upf software lab...
...
   type upf
  exit
  ucs-server cimc user admin
...
  ucs-server cimc ip-address 10.10.10.10
...
  exit

SSH到UCS主机

连接到UCS主机并验证故障条目(范围故障)、显示故障条目和显示故障历史记录。

labucs111-cmp1-11 /fault # show fault-entries 
Time Severity Description ------------------------- ------------- --------------------------------------- 
2021-03-26T10:10:10 major "DDR4_P1_C1_ECC: DIMM 19 is inoperable : Check or replace DIMM"

LABCP0222-Server22-02 /fault # show fault-history
Time                Severity      Source          Cause                     Description                             
------------------- ------------- --------------- ------------------------- ----------------------------------------
2021 Dec 10 02:02:02 UTC info          %CIMC           EQUIPMENT_INOPERABLE      "[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared "
2021 Dec 1 01:01:01 UTC critical      %CIMC           EQUIPMENT_INOPERABLE      "[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status. "

修订历史记录

版本	发布日期	备注
1.0	26-May-2022	初始版本

由思科工程师提供

Cinthia Janneth Martinez
思科TAC工程师
Nebojsa Kosanovic
思科TAC工程师

此文档是否有帮助?

反馈

联系我们

提交支持案例
(需要思科服务合同)