Troubleshoot CentOS Kernel Crash in CPS

Available Languages

Download Options

PDF (29.2 KB)
View with Adobe Reader on a variety of devices
ePub (79.6 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (64.0 KB)
View on Kindle device or Kindle app on multiple devices

Updated:May 18, 2023

Document ID:220468

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

Problem

Solution

Introduction

This document describes how to troubleshoot CPS(Cisco Policy Suite) VM restart issue caused by CentOS kernel crash.

Problem

Each CPS VMs(qns,lb,pcrfclient and so on) runs based on CentOS. These VM can reboot due to a problem on CentOS side rather than a problem with CPS application side. If a reboot occurs due to a problem with the CentOS kernel, the root cause can not be found even if the CPS capture_env is investigated. The capture_env logs does not contain any error logs from rebooted VM during reboot. In such cases, the logs under /var/crash can be used for investigation.

Solution

CentOS can generate a kernel crash dump when problem occurs with kernel. By default, CPS is configured to collect kernel crash dumps for all VMs.

The status can be checked with this command.

[root@dc1-qns01 ~]# systemctl status kdump.service
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: active (exited) since Tue 2023-01-10 07:29:35 UTC; 4 months 4 days ago
 Main PID: 1023 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 75300)
   Memory: 0
   CGroup: /system.slice/kdump.service

If a kernel crash occurs with kdump.service enabled, a directory with the name "address-YYYY-MM-DD-HH:MM:SS" is generated under /var/crash. CentOS generates 2 files under this directory.

[root@dc1-lb02 127.0.0.1-2022-10-18-06:18:41]# pwd
/var/crash/127.0.0.1-2022-10-18-06:18:41

[root@dc1-lb02 127.0.0.1-2022-10-18-06:18:41]# ls -rtl
total 161436
-rw-r--r-- 1 root root     89787 Oct 18  2022 vmcore-dmesg.txt
-rw------- 1 root root 165215218 Oct 18  2022 vmcore

vmcore:
A file that stores the contents of kernel memory as a binary file. Analysis requires tools such as kernel-debuginfo and crash.

vmcore-dmesg.txt:
dmesg text file when crash occurs.

As an example, in the log on the CPS side, error logs just before the reboot was not confirmed from logs from the VM that rebooted. Analysis result from VMWare side, the reboot was caused with this error log which would caused by guest OS.

The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.

Check the /var/crash of the rebooted VM, if there is a directory matched with the reboot time. It turned out that the reboot was due to a kernel problem on the CentOS side, and we were able to proceed further investigation.

Revision History

Revision	Publish Date	Comments
1.0	22-May-2023	Initial Release

Contributed by Cisco Engineers

Yasuaki Nambu
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Policy Suite for Mobile

Troubleshoot CentOS Kernel Crash in CPS

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Problem

Solution

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products