Troubleshoot Mongod Instance Failure Due to Increased DATA_PATH Space Utilization

Available Languages

Download Options

PDF (44.1 KB)
View with Adobe Reader on a variety of devices
ePub (86.8 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (75.6 KB)
View on Kindle device or Kindle app on multiple devices

Updated:January 26, 2022

Document ID:217661

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Background Information

Problem

Restore the Mongod Instance in Sessionmgr

Introduction

This document describes how to troubleshoot the mongod instance failure in Cisco Policy Suite (CPS) sessionmgr due to the increased DATA_PATH space utilization.

Prerequisites

Requirements

Cisco recommends that you have knowledge of these topics:

Linux
CPS
MongoDB

Components Used

The information in this document is based on these software and hardware versions:

CPS 20.2
MongoDB v3.6.17
UCS-B

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Background Information

CPS uses MongoDB where mongod processes run on sessionmgr virtual machines (VMs) in order to constitute its basic database structure.

Multiple mongod instances run on a sessionmgr, and each of them has been assigned different port numbers. These mongod instances take part in various Replica Sets.

Problem

Whenever any particular mongod instance stops due to increased DATA_PATH space consumption of its associated DATA PATH, you will notice the same in diagnostics for that sessionmgr. Connections to the specific port failed and there is 100% utilization of the /var/data/sessions.X partition. Hence, that mongod instance goes into the OFF-LINE state in the respective Replica Set. Subsequently, its participation status in that Replica Set becomes UNKNOWN.

A sample error in the diagnostics is provided. Enter the diagnostics.sh command from ClusterManager or pcrfclient in order to check the current status of mongod and Replica Set.

Could not connect to port 27718 on sessionmgr02 (set02)...[FAIL]
Disk usage on sessionmgr02...[FAIL]
Disk usage is above critical threshold (97%) on sessionmgr02.
Results of: ssh root@sessionmgr02 -x 'df -hP -x iso9660'
-----------------------------------
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 95G 28G 62G 32% /
tmpfs 48G 0 48G 0% /dev/shm
tmpfs 57G 0 57G 0% /var/data/sessions.1
tmpfs 12G 12G 0 100% /var/data/sessions.2
-----------------------------------

|---------------------------------------------------------------------------------------------|
| BALANCE:set02 |
| Status via arbitervip:27718 sessionmgr01:27718 |
| Member-1 - 27718 : - UNKNOWN - sessionmgr02 - OFF-LINE - 19003 days - 2 |
| Member-2 - 27718 : - PRIMARY - sessionmgr01 - ON-LINE - -------- - 3 |
| Member-3 - 27718 : 192.168.10.146 - ARBITER - arbitervip - ON-LINE - -------- - 0 |
|---------------------------------------------------------------------------------------------|

Restore the Mongod Instance in Sessionmgr

The section details the procedure to restore the mongod instance in sessionmgr if it is down due to increased DATA_PATH space consumption.

Before you begin this procedure, you must have privilege access to:

Root access to the CPS CLI
"qns-svn" user access to the CPS GUIs - Policy builder and CPS Central

Provided here is the procedure for sessionmg02 and port 27718, which is part of set02.

Enter this command in order to identify the partition where it stored data for that specific set02.

[root@dc1-sessionmgr02 ~]# cat /etc/broadhop/mongoConfig.cfg | grep -A6 set02 | grep "DATA_PATH"
ARBITER_DATA_PATH=/var/data/sessions.2
DATA_PATH=/var/data/sessions.2

Enter this command in order to verify if the aido_client process is present or not.

[root@dc1-sessionmgr02 ~]# monsum
Monit 5.26.0 uptime: 11d 2h 9m
┌─────────────────────────────────┬────────────────────────────┬───────────────┐
│ Service Name │ Status │ Type │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ dc1-sessionmgr02 │ OK │ System │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ whisper │ OK │ Process │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ snmpd │ OK │ Process │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ memcached │ OK │ Process │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ collectd │ OK │ Process │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ auditrpms.sh │ OK │ Process │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ aido_client │ OK │ Process │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ primary_db_frag │ OK │ Program │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ cpu_load_monitor │ OK │ Program │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ cpu_load_trap │ OK │ Program │
├─────────────────────────────────┼────────────────────────────┼───────────────┤
│ gen_low_mem_trap │ OK │ Program │
└─────────────────────────────────┴────────────────────────────┴───────────────┘

If the aido_client process is present, enter the monit stop aido_client command in order to stop it.

Enter this command in order to verify if the respective mongod instance process is still active or not.

[root@dc1-sessionmgr02 ~]# ps -ef | grep 27718
root 12292 11114 0 02:05 pts/0 00:00:00 grep --color=auto 27718
root 19620 1 0 2021 ? 01:36:51 /usr/bin/mongod --ipv6 --syncdelay 1 --slowms 500 --storageEngine
 mmapv1 --bind_ip_all --port 27718 --dbpath=/var/data/sessions.2 --replSet set02 --fork --pidfilepath
 /var/run/sessionmgr-27718.pid --oplogSize 5120 --logpath /var/log/mongodb-27718.log --logappend --quiet
[root@dc1-sessionmgr02 ~]#

If the mongod instance is still active, enter this command to stop it.

[root@dc1-sessionmgr02 ~]# /etc/init.d/sessionmgr-27718 stop
Stopping sessionmgr-27718 (via systemctl): [ OK ]
[root@dc1-sessionmgr02 ~]#

Navigate to the DATA_PATH received in step 1.

[root@dc1-sessionmgr02 ~]# cd /var/data/sessions.2
[root@dc1-sessionmgr02 sessions.2]# ls -lrt
total 6616100
-rw------- 1 root root 16777216 Jun 22 2018 admin.ns
-rw------- 1 root root 67108864 Jun 22 2018 admin.0
-rw------- 1 root root 69 Nov 10 07:27 storage.bson
-rw------- 1 root root 16777216 Nov 10 07:27 vouchers.ns
-rw------- 1 root root 67108864 Nov 10 07:27 vouchers.0
-rw------- 1 root root 2146435072 Nov 10 07:27 local.2
drwx------ 2 root root 4096 Nov 10 07:27 local
-rw------- 1 root root 67108864 Nov 10 07:27 local.0
-rw------- 1 root root 16777216 Jan 7 14:38 config.ns
-rw------- 1 root root 67108864 Jan 7 14:38 config.0
-rw------- 1 root root 16777216 Jan 11 02:06 local.ns
-rw------- 1 root root 2146435072 Jan 11 02:06 local.1
drwx------ 2 root root 4096 Jan 11 02:06 diagnostic.data
-rw------- 1 root root 2146435072 Jan 11 02:06 local.3
-rw------- 1 root root 0 Jan 11 02:07 mongod.lock
drwx------ 2 root root 4096 Jan 11 02:08 journal
[root@dc1-sessionmgr02 sessions.2]#

Enter the command rm -rf * in order to clear the DATA_PATH.

Enter this command in order to start the mongod instance. This command takes a couple of minutes to complete.

[root@dc1-sessionmgr02 ~]# /etc/init.d/sessionmgr-27718 start
Starting sessionmgr-27718 (via systemctl): [ OK ]
[root@dc1-sessionmgr02 ~]#

If you have stopped the aido_client process in step 3, enter the monit start adio_client command in order to start it again.

Enter the diagnostics.sh command from ClusterManager or pcrfclient in order to confirm the respective mongod instance is restored and became ON-LINE in Replica Set.

|---------------------------------------------------------------------------------------------|
| BALANCE:set02 |
| Status via arbitervip:27718 sessionmgr01:27718 sessionmgr02:27718 |
| Member-1 - 27718 : - SECONDARY - sessionmgr02 - ON-LINE - 0 sec - 2 |
| Member-2 - 27718 : - PRIMARY - sessionmgr01 - ON-LINE - -------- - 3 |
| Member-3 - 27718 : XX.XX.XX.XX - ARBITER - arbitervip - ON-LINE - -------- - 0 |
|---------------------------------------------------------------------------------------------|

Revision History

Revision	Publish Date	Comments
1.0	27-Jan-2022	Initial Release

Contributed by Cisco Engineers

Midhun P
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

Troubleshoot Mongod Instance Failure Due to Increased DATA_PATH Space Utilization

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Prerequisites

Requirements

Components Used

Background Information

Problem

Restore the Mongod Instance in Sessionmgr

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products