Troubleshoot Server Issues in CNDP Solution

Available Languages

Download Options

PDF (93.0 KB)
View with Adobe Reader on a variety of devices
ePub (81.5 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (67.3 KB)
View on Kindle device or Kindle app on multiple devices

Updated:May 26, 2022

Document ID:217899

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

Background Information

Problem

Solution

Example Output for Containers

Example Output for VMs

SSH Into the UCS Host

Introduction

This document describes how to identify a Unified Computing System (UCS) and check fault entries on it in the Cloud Native Deployment Platform (CNDP).

Background Information

The hardware-related alerts are reported in The Ultra Cloud Core Subscriber Microservices Infrastructure (SMI) Cluster Manager (CM) Common Execution Environment (CEE). The Kubernetes (K8s), docker, and so on related information are reported in the CM virtual IP (VIP).

Caution: Please refer to Network Design and Customer Information Questionnaire (CIQ) to verify the IPs.

Problem

The error "Equipment Alarm" is reported in show alerts.

Log in to CM-CEE, run the command show alerts active detail, and show alerts history summary in order to display all active and history alerts.
Note the server IP reported in the alert.

[lab-deployer/labceec01] cee# show alerts active detail 
alerts active detail server-alert 9c367ce5ee48
 severity    major
 type        "Equipment Alarm"
 startsAt    2021-10-27T17:10:37.025Z
 source      10.10.10.10
 summary     "DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM"
 labels      [ "alertname: server-alert" "cluster: cr-chr-deployer" "description: DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM" "fault_id: sys/rack-unit-1/board/memarray-1/mem-5/fault-F0185" "id: 134219020" "monitor: prometheus" "replica: cr-chr-deployer" "server: 10.10.10.10" "severity: major" ]
 annotations [ "dn: cr-chr-deployer/10.10.10.10/sys/rack-unit-1/board/memarray-1/mem-5/fault-F0185/134219020" "summary: DDR4_P1_C1_ECC: DIMM 5 is inoperable : Check or replace DIMM" "type: Equipment Alarm" ]

[lab-deployer/labceec01] cee# show alerts history summary
NAME      UID           SEVERITY  STARTS AT       DURATION  SOURCE       SUMMARY            
---------------------------------------------------------------------------------------------
vm-alive  f6a65030b593  minor     09-02T10:28:28  1m40s     10-192-0-13  labd0123 is alive. 
vm-error  3a6d840e3eda  major     09-02T10:27:18  1m        10-192-0-13  labd0123 is down.  
vm-alive  49b2c1941dc6  minor     09-02T10:25:38  1m40s     10-192-0-14  labd0123 is alive.

Solution

Identify the Services (containers) and/or Virtual Machine (VM) or Kernel-based Virtual Machine (KVM) that is hosted on the server in the SMI CM, run the command show running-config and find the configuration for the server IP.

Log in to the CM VIP (username: cloud-user)
Get the IP from OPS Center for the smi-cm namespace
Log in to the OPS Center, and check the cluster configuration
Identify nodes and VMs that run on the server

cloud-user@lab-deployer-cm-primary:~$ kubectl get svc -n smi-cm
NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP      PORT(S)                                                 AGE
cluster-files-offline-smi-cluster-deployer    ClusterIP   10.102.200.178   <none>           8080/TCP                                                98d
iso-host-cluster-files-smi-cluster-deployer   ClusterIP   10.102.100.208     192.168.1.102    80/TCP                                                  98d
iso-host-ops-center-smi-cluster-deployer      ClusterIP   10.102.200.73    192.168.1.102    3001/TCP                                                98d
netconf-ops-center-smi-cluster-deployer       ClusterIP   10.102.100.207   192.168.184.193   3022/TCP,22/TCP                                         98d
ops-center-smi-cluster-deployer               ClusterIP   10.10.20.20     <none>           8008/TCP,2024/TCP,2022/TCP,7681/TCP,3000/TCP,3001/TCP   98d
squid-proxy-node-port                         NodePort    10.102.60.114    <none>           3128:32261/TCP                                          98d

cloud-user@lab-deployer-cm-primary:~$ ssh -p 2024 admin@10.10.20.20
admin@10.10.20.20's password:
      Welcome to the Cisco SMI Cluster Deployer on lab-deployer-cm-primary
      Copyright © 2016-2020, Cisco Systems, Inc.
      All rights reserved.
admin connected from 192.168.1.100 using ssh on ops-center-smi-cluster-deployer-7848c69844-xzdw6
[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters

Example Output for Containers

In this example, the server is used by node primary-1.

[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters lab01-smf nodes primary-1
clusters lab01-smf
nodes primary-1
  maintenance false
  k8s node-type       primary
  k8s ssh-ip          10.192.10.22
  k8s sshd-bind-to-ssh-ip true
  k8s node-ip         10.192.10.22
  k8s node-labels smi.cisco.com/node-type oam
  exit
  k8s node-labels smi.cisco.com/node-type-1 proto
  exit
  ucs-server cimc user admin
  ucs-server cimc ip-address 10.10.10.10

Example Output for VMs

The server can be used for the KVM-based VM.

In this example, the server has User Plane Functions (UPFs) - upf1 and upf2.

[lab-deployer-cm-primary] SMI Cluster Deployer# show running-config clusters lab01-upf nodes labupf
clusters lab01-upf
nodes labupf
  maintenance false
  ssh-ip      10.192.30.7
  type        kvm
  vms upf1
   upf software lab...
...
   type upf
  exit
  vms upf2
   upf software lab...
...
   type upf
  exit
  ucs-server cimc user admin
...
  ucs-server cimc ip-address 10.10.10.10
...
  exit

SSH Into the UCS Host

Connect into the UCS host and verify fault entries with scope fault, show fault entries, and show fault history.

labucs111-cmp1-11 /fault # show fault-entries 
Time Severity Description ------------------------- ------------- --------------------------------------- 
2021-03-26T10:10:10 major "DDR4_P1_C1_ECC: DIMM 19 is inoperable : Check or replace DIMM"

LABCP0222-Server22-02 /fault # show fault-history
Time                Severity      Source          Cause                     Description                             
------------------- ------------- --------------- ------------------------- ----------------------------------------
2021 Dec 10 02:02:02 UTC info          %CIMC           EQUIPMENT_INOPERABLE      "[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Cleared "
2021 Dec 1 01:01:01 UTC critical      %CIMC           EQUIPMENT_INOPERABLE      "[F0174][critical][equipment-inoperable][sys/rack-unit-1/board] IERR: A catastrophic fault has occurred on one of the processors: Please check the processor's status. "

Revision History

Revision	Publish Date	Comments
1.0	26-May-2022	Initial Release

Contributed by Cisco Engineers

Cinthia Janneth Martinez
Cisco TAC Engineer
Nebojsa Kosanovic
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

Troubleshoot Server Issues in CNDP Solution

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Background Information

Problem

Solution

Example Output for Containers

Example Output for VMs

SSH Into the UCS Host

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products