Troubleshoot 5G SMI CEE Postgres POD Issues

Available Languages

Download Options

PDF (83.0 KB)
View with Adobe Reader on a variety of devices
ePub (83.4 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (71.8 KB)
View on Kindle device or Kindle app on multiple devices

Updated:January 25, 2022

Document ID:217656

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Background Information

What is Postgres POD?

Delete Content From Folders

Restore the CEE

Post Checks

Verify Alerts are Cleared From the CEE

Introduction

This document describes how to implement the workaround for the Subscriber Microservices Infrastructure (SMI) Common Execution Environment (CEE) Pool of Devices (POD) (pgpool) restart issues.

Prerequisites

Requirements

Cisco recommends that you have knowledge of these topics:

Cisco SMI CEE (Ultra Cloud Core CEE)
5G Cloud Native Deployment Platform (CNDP) or SMI Bare Metal (BM) architecture
Dockers and Kubernetes

Components Used

The information in this document is based on these software and hardware versions:

SMI 2020.02.2.35
Kubernetes v1.21.0

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Background Information

What is SMI?

Cisco SMI is a layered stack of cloud technologies and standards that enable microservices-based applications from Cisco Mobility, Cable, and Broadband Network Gateway (BNG) business units – all of which have similar subscriber management functions and similar datastore requirements.

The attributes are:

Layer Cloud Stack (technologies and standards) to provide top to bottom deployments and also accommodate current customer cloud infrastructures.
The CEE is shared by all applications for non-application functions (data storage, deployment, configuration, telemetry, and alarm). This provides consistent interaction and experience for all customer touchpoints and integration points.
Applications and the CEE are deployed in microservice containers and connected with an Intelligent Service Mesh.
Exposed API for deployment, configuration, and management in order to enable automation.

What is SMI CEE?

The CEE is a software solution developed to monitor mobile and cable applications that are deployed on the SMI. The CEE captures information (key metrics) from the applications in a centralized way for engineers to debug and troubleshoot.

The CEE is the common set of tools that are installed for all the applications. It comes equipped with a dedicated Ops Center, which provides the user interface (CLI) and APIs to manage the monitor tools. There is only one CEE available for each cluster.

What are CEE PODs?

A POD is a process that runs on your Kubernetes cluster. The POD encapsulates a granular unit that is known as a container. A POD contains one or multiple containers.

Kubernetes deploys one or multiple PODs on a single node which can be a physical or virtual machine. Each POD has a discrete identity with an internal IP address and port space. However, the containers within a POD can share the storage and network resources. CEE has a number of PODs that have unique functions. Pgpool and postgress are among several CEE PODs.

What is pgpool POD?

Pgpool manages the Postgres resource pool for connection, replication, load balance, and so on. Pgpool is a middleware that works between PostgreSQL servers and a PostgreSQL database.

What is Postgres POD?

Postgres supports the Structured Query Language (SQL) database with redundancy to store alerts and Grafana dashboards.

Problem

The pgpool PODs restart regularly while the postgresql PODs run with no issues.

In order to display the alerts, enter this command:

show alerts active summary | include "POD_|k8s-pod-"

A sample alert from the CEE is shown here.

[pod-name-smf-data/podname] cee# show alerts active summary | include "POD_|k8s-pod-"
k8s-pod-crashing-loop 1d9d2b113073 critical 12-15T21:47:39 pod-name-smf-data-mas
 Pod cee-podname/grafana-65cbdb9846-krgfq (grafana) is restarting 1.03 times / 5 minutes.
POD_Restarted 04d42efb81de major 12-15T21:45:44 pgpool-67f48f6565-vjt Container=
 k8s_pgpool_pgpool-67f48f6565-vjttd_cee-podname_a9f68607-eac4-40a9-86ef-db8176e0a22a_1474 of pod= pgpool-...
POD_Restarted f7657a0505c2 major 12-15T21:45:44 postgres-0 Container=
 k8s_postgres_postgres-0_cee-podname_59e0a768-6870-4550-8db3-32e2ab047ce2_1385 of pod= postgres-0 in name...
POD_Restarted 6e57ae945677 major 12-15T21:45:44 alert-logger-d96644d4 Container=
 k8s_alert-logger_alert-logger-d96644d4-dsc8h_cee-podname_2143c464-068a-418e-b5dd-ce1075b9360e_2421 of po...
k8s-pod-crashing-loop 5b8e6a207aad critical 12-15T21:45:09 pod-name-smf-data-mas Pod
 cee-podname/pgpool-67f48f6565-vjttd (pgpool) is restarting 1.03 times / 5 minutes.
POD_Down 45a6b9bf73dc major 12-15T20:30:44 pgpool-67f48f6565-qbw Pod= pgpool-67f48f6565-qbw52 in namespace=
 cee-podname is DOWN for more than 15min
POD_Down 4857f398a0ca major 12-15T16:40:44 pgpool-67f48f6565-vjt Pod= pgpool-67f48f6565-vjttd in namespace=
 cee-podname is DOWN for more than 15min
k8s-pod-not-ready fc65254c2639 critical 12-11T21:07:29 pgpool-67f48f6565-qbw Pod
 cee-podname/pgpool-67f48f6565-qbw52 has been in a non-ready state for longer than 1 minute.
k8s-pod-not-ready 008b859e7333 critical 12-11T16:35:49 pgpool-67f48f6565-vjt Pod
 cee-podname/pgpool-67f48f6565-vjttd has been in a non-ready state for longer than 1 minute.

Troubleshoot

From the Kubernetes master, enter this command:

kubectl describe pods -n <namespace> postgres-0"

The sample output of the POD description is shown here. The output is truncated.

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned cee-pod-name-l1/postgres-2
 to pod-name-master-3
Normal Pulling 14m kubelet Pulling image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
Normal Pulled 13m kubelet Successfully pulled image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d" in 29.048094722s
Warning Unhealthy 12m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:09:48]
 pod is not ready
Warning Unhealthy 10m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:11:18]
 pod is not ready
Warning Unhealthy 10m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:11:48]
 pod is not ready
Warning Unhealthy 9m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:12:18]
 pod is not ready
Warning Unhealthy 9m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:12:48]
 pod is not ready
Warning Unhealthy 8m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:13:18]
 pod is not ready
Warning Unhealthy 8m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:13:48]
 pod is not ready
Warning Unhealthy 7m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:14:18]
 pod is not ready
Warning Unhealthy 7m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:14:48]
 pod is not ready
Warning BackOff 6m44s kubelet Back-off restarting failed container

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m default-scheduler 0/5 nodes are available: 2 node(s)
 didn't match Pod's node affinity/selector, 3 node(s) didn't find available persistent
 volumes to bind.
Normal Scheduled 13m default-scheduler Successfully assigned cee-pod-name-l1/postgres-0
 to pod-name-master-1
Warning FailedScheduling 13m default-scheduler 0/5 nodes are available: 2 node(s)
 didn't match Pod's node affinity/selector, 3 node(s) didn't find available
 persistent volumes to bind.
Normal Pulling 13m kubelet Pulling image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
Normal Pulled 12m kubelet Successfully pulled image "docker.10.192.x.x.nip.io/
cee-2020.02.2.i38/smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
 in 43.011763302s
Warning Unhealthy 7m20s kubelet Liveness probe failed: [bin][h][imm] >>>
 [2021-10-11 18:09:16] My name is pg-postgres-0

Workaround

Note: This procedure does not cause any downtime in the application.

Shut Down the CEE

In order to shut down the CEE, enter these commands from the CEE:

[pod-name-smf-data/podname] cee# 
[pod-name-smf-data/podname] cee# config terminal
Entering configuration mode terminal
[pod-name-smf-data/podname] cee(config)# system mode shutdown
[pod-name-smf-data/podname] cee(config)# commit
Commit complete.

Wait for the system to go to 100%

Delete Content From Folders

From master-vip, SSH to each one of the master VMs and remove the contents of these folders: /data/cee-podname/data-postgres-[0-2].

Master 1
cloud-user@pod-name-smf-data-master-1:~$ sudo rm -rf /data/cee-podname/data-postgres-0
Master 2
cloud-user@pod-name-smf-data-master-2:~$ sudo rm -rf /data/cee-podname/data-postgres-1
Master 3
cloud-user@pod-name-smf-data-master-3:~$ sudo rm -rf /data/cee-podname/data-postgres-2

Restore the CEE

In order to restore the CEE, enter these commands from the CEE:

[pod-name-smf-data/podname] cee# 
[pod-name-smf-data/podname] cee# config terminal
Entering configuration mode terminal
[pod-name-smf-data/podname] cee(config)# system mode running
[pod-name-smf-data/podname] cee(config)# commit
Commit complete.

Wait for the system to go to 100%.

Post Checks

Verify Kubernetes from the master.

cloud-user@pod-name-smf-data-master-1:~$ kubectl get pods -A -o wide | egrep 'postgres|pgpool'
All pods should display up and running without any restarts

Verify Alerts are Cleared From the CEE

In order to verify that alerts are cleared from the CEE, enter this command:

show alerts active summary | include "POD_|k8s-pod-"

Also, you can enter this command in order to ensure there is one master and two standby DBs:

echo "0----------------------------------";kubectl
 exec -it postgres-0 -n $(kubectl get pods -A | grep postgres | awk '{print $1}' | head -1)
 -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "1--------------------------
--------";kubectl exec -it postgres-1 -n $(kubectl get pods -A | grep postgres | awk '{print $1}'
 | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "2---------------
-------------------"; kubectl exec -it postgres-2 -n $(kubectl get pods -A | grep postgres |
 awk '{print $1}' | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;

The sample expected output is:

cloud-user@pod-name-smf-data-master-1:~$ echo "0----------------------------------";kubectl
 exec -it postgres-0 -n $(kubectl get pods -A | grep postgres | awk '{print $1}' | head -1)
 -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "1--------------------------
--------";kubectl exec -it postgres-1 -n $(kubectl get pods -A | grep postgres | awk '{print $1}'
 | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "2---------------
-------------------"; kubectl exec -it postgres-2 -n $(kubectl get pods -A | grep postgres |
 awk '{print $1}' | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;
0----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:18] My name is pg-postgres-0
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I'm not a master, nothing else to do!
1----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:19] My name is pg-postgres-1
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I think I'm master. Will ask my neighbors if they agree.
[bin][h][imm] >>> Will ask nodes from PARTNER_NODES list
[bin][h][imm] >>> Checking node pg-postgres-0
[bin][h][imm] >>>>>>>>> Count of references to potential master pg-postgres-1 is 1 now
[bin][h][imm] >>> Checking node pg-postgres-1
[bin][h][imm] >>> Checking node pg-postgres-2
[bin][h][imm] >>>>>>>>> Count of references to potential master pg-postgres-1 is 2 now
[bin][h][imm] >>> Potential masters got references:
[bin][h][imm] >>>>>> Node: pg-postgres-1, references: 2
[bin][h][imm] >>> I have 2/2 incoming reference[s]!
[bin][h][imm] >>>> 2 - Does anyone have more?
[bin][h][imm] >>> Yahoo! I'm real master...so I think!
2----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:21] My name is pg-postgres-2
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I'm not a master, nothing else to do!

Revision History

Revision	Publish Date	Comments
1.0	25-Jan-2022	Initial Release

Contributed by Cisco Engineers

Adithian Arathi
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

Troubleshoot 5G SMI CEE Postgres POD Issues

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Prerequisites

Requirements

Components Used

Background Information

What is SMI?

What is SMI CEE?

What are CEE PODs?

What is pgpool POD?

What is Postgres POD?

Problem

Troubleshoot

Workaround

Shut Down the CEE

Delete Content From Folders

Restore the CEE

Post Checks

Verify Alerts are Cleared From the CEE

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products