排除5G SMI CEE Postgres POD問題

下載選項

PDF (232.9 KB)
在多種裝置上使用 Adobe Reader 檢視
ePub (81.9 KB)
在 iPhone、iPad、Android、Sony Reader 或 Windows Phone 上的各種應用程式中檢視
Mobi (Kindle) (70.3 KB)
在 Kindle 裝置或多部裝置的 Kindle 應用程式上檢視

已更新: 2022 年 1 月 25 日

文件 ID:217656

無偏見用語

本產品的文件集力求使用無偏見用語。針對本文件集的目的，無偏見係定義為未根據年齡、身心障礙、性別、種族身分、民族身分、性別傾向、社會經濟地位及交織性表示歧視的用語。由於本產品軟體使用者介面中硬式編碼的語言、根據 RFP 文件使用的語言，或引用第三方產品的語言，因此本文件中可能會出現例外狀況。深入瞭解思科如何使用包容性用語。

關於此翻譯

思科已使用電腦和人工技術翻譯本文件，讓全世界的使用者能夠以自己的語言理解支援內容。請注意，即使是最佳機器翻譯，也不如專業譯者翻譯的內容準確。Cisco Systems, Inc. 對這些翻譯的準確度概不負責，並建議一律查看原始英文文件（提供連結）。

簡介

本文描述如何實施使用者微服務基礎設施(SMI)通用執行環境(CEE)裝置池(POD)(pgpool)重新啟動問題的解決方法。

必要條件

需求

思科建議您瞭解以下主題：

Cisco SMI CEE(Ultra Cloud Core CEE)
5G雲原生部署平台(CNDP)或SMI裸機(BM)架構
多克和庫伯內特

採用元件

本文中的資訊係根據以下軟體和硬體版本：

SMI 2020.02.2.35
Kubernetes v1.21.0

本文中的資訊是根據特定實驗室環境內的裝置所建立。文中使用到的所有裝置皆從已清除（預設）的組態來啟動。如果您的網路運作中，請確保您瞭解任何指令可能造成的影響。

背景資訊

什麼是SMI?

Cisco SMI是雲技術和標準的分層堆疊，支援來自思科移動性、電纜和寬頻網路網關(BNG)業務部門的基於微服務的應用 — 所有這些業務部門都具有相似的使用者管理功能和相似的資料儲存要求。

屬性包括：

層雲堆疊（技術和標準），提供自上而下的部署，並適應當前的客戶雲基礎設施。
CEE由所有應用程式共用，用於非應用程式功能（資料儲存、部署、配置、遙測和警報）。這為所有客戶接觸點和整合點提供一致的互動和體驗。
應用和CEE部署在微服務容器中，並與智慧服務網狀網連線。
公開的API用於部署、配置和管理，以實現自動化。

什麼是SMI CEE?

CEE是為監控部署在SMI上的移動和電纜應用而開發的軟體解決方案。CEE以集中方式從應用程式捕獲資訊（關鍵指標），以便工程師進行調試和故障排除。

CEE是為所有應用程式安裝的常用工具集。它配備了專用的運營中心，可提供用於管理監控工具的使用者介面(CLI)和API。每個集群只能有一個CEE。

什麼是CEE POD?

POD是在您的Kubernetes群集上運行的進程。POD封裝了一個稱為容器的粒狀單元。POD包含一個或多個容器。

Kubernetes在單個節點（可以是物理或虛擬機器）上部署一個或多個POD。每個POD具有獨立的身份，具有內部IP地址和埠空間。但是，POD中的容器可以共用儲存和網路資源。CEE擁有許多具有獨特功能的POD。Pgpool和postgress是多個CEE POD之一。

什麼是pgpool POD?

Pgpool管理用於連線、複製、負載平衡等的Postgres資源池。Pgpool是在PostgreSQL伺服器和PostgreSQL資料庫之間工作的中介軟體。

什麼是Postgres POD?

Postgres支援結構化查詢語言(SQL)資料庫，該資料庫具有儲存警報和Grafana儀表板的冗餘。

問題

pgpool POD會定期重新啟動，而postgresql POD則不會出現任何問題。

要顯示警報，請輸入以下命令：

show alerts active summary | include "POD_|k8s-pod-"

此處顯示了CEE的警報示例。

[pod-name-smf-data/podname] cee# show alerts active summary | include "POD_|k8s-pod-"
k8s-pod-crashing-loop 1d9d2b113073 critical 12-15T21:47:39 pod-name-smf-data-mas
 Pod cee-podname/grafana-65cbdb9846-krgfq (grafana) is restarting 1.03 times / 5 minutes.
POD_Restarted 04d42efb81de major 12-15T21:45:44 pgpool-67f48f6565-vjt Container=
 k8s_pgpool_pgpool-67f48f6565-vjttd_cee-podname_a9f68607-eac4-40a9-86ef-db8176e0a22a_1474 of pod= pgpool-...
POD_Restarted f7657a0505c2 major 12-15T21:45:44 postgres-0 Container=
 k8s_postgres_postgres-0_cee-podname_59e0a768-6870-4550-8db3-32e2ab047ce2_1385 of pod= postgres-0 in name...
POD_Restarted 6e57ae945677 major 12-15T21:45:44 alert-logger-d96644d4 Container=
 k8s_alert-logger_alert-logger-d96644d4-dsc8h_cee-podname_2143c464-068a-418e-b5dd-ce1075b9360e_2421 of po...
k8s-pod-crashing-loop 5b8e6a207aad critical 12-15T21:45:09 pod-name-smf-data-mas Pod
 cee-podname/pgpool-67f48f6565-vjttd (pgpool) is restarting 1.03 times / 5 minutes.
POD_Down 45a6b9bf73dc major 12-15T20:30:44 pgpool-67f48f6565-qbw Pod= pgpool-67f48f6565-qbw52 in namespace=
 cee-podname is DOWN for more than 15min
POD_Down 4857f398a0ca major 12-15T16:40:44 pgpool-67f48f6565-vjt Pod= pgpool-67f48f6565-vjttd in namespace=
 cee-podname is DOWN for more than 15min
k8s-pod-not-ready fc65254c2639 critical 12-11T21:07:29 pgpool-67f48f6565-qbw Pod
 cee-podname/pgpool-67f48f6565-qbw52 has been in a non-ready state for longer than 1 minute.
k8s-pod-not-ready 008b859e7333 critical 12-11T16:35:49 pgpool-67f48f6565-vjt Pod
 cee-podname/pgpool-67f48f6565-vjttd has been in a non-ready state for longer than 1 minute.

疑難排解

在Kubernetes主目錄中，輸入以下命令：

kubectl describe pods -n 
     
     
       postgres-0"

此處顯示了POD描述的示例輸出。輸出被截斷。

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned cee-pod-name-l1/postgres-2
 to pod-name-master-3
Normal Pulling 14m kubelet Pulling image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
Normal Pulled 13m kubelet Successfully pulled image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d" in 29.048094722s
Warning Unhealthy 12m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:09:48]
 pod is not ready
Warning Unhealthy 10m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:11:18]
 pod is not ready
Warning Unhealthy 10m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:11:48]
 pod is not ready
Warning Unhealthy 9m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:12:18]
 pod is not ready
Warning Unhealthy 9m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:12:48]
 pod is not ready
Warning Unhealthy 8m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:13:18]
 pod is not ready
Warning Unhealthy 8m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:13:48]
 pod is not ready
Warning Unhealthy 7m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:14:18]
 pod is not ready
Warning Unhealthy 7m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:14:48]
 pod is not ready
Warning BackOff 6m44s kubelet Back-off restarting failed container

或

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m default-scheduler 0/5 nodes are available: 2 node(s)
 didn't match Pod's node affinity/selector, 3 node(s) didn't find available persistent
 volumes to bind.
Normal Scheduled 13m default-scheduler Successfully assigned cee-pod-name-l1/postgres-0
 to pod-name-master-1
Warning FailedScheduling 13m default-scheduler 0/5 nodes are available: 2 node(s)
 didn't match Pod's node affinity/selector, 3 node(s) didn't find available
 persistent volumes to bind.
Normal Pulling 13m kubelet Pulling image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
Normal Pulled 12m kubelet Successfully pulled image "docker.10.192.x.x.nip.io/
cee-2020.02.2.i38/smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
 in 43.011763302s
Warning Unhealthy 7m20s kubelet Liveness probe failed: [bin][h][imm] >>>
 [2021-10-11 18:09:16] My name is pg-postgres-0

因應措施

附註：此過程不會導致應用程式發生任何停機。

關閉CEE

要關閉CEE，請從CEE輸入以下命令：

[pod-name-smf-data/podname] cee# 
[pod-name-smf-data/podname] cee# config terminal
Entering configuration mode terminal
[pod-name-smf-data/podname] cee(config)# system mode shutdown
[pod-name-smf-data/podname] cee(config)# commit
Commit complete.

等待系統達到100%

從資料夾中刪除內容

從主VIP通過SSH連線到每個主VM並刪除這些資料夾的內容：/data/cee-podname/data-postgres-[0-2]。

Master 1
cloud-user@pod-name-smf-data-master-1:~$ sudo rm -rf /data/cee-podname/data-postgres-0
Master 2
cloud-user@pod-name-smf-data-master-2:~$ sudo rm -rf /data/cee-podname/data-postgres-1
Master 3
cloud-user@pod-name-smf-data-master-3:~$ sudo rm -rf /data/cee-podname/data-postgres-2

恢復CEE

要恢復CEE，請從CEE輸入以下命令：

[pod-name-smf-data/podname] cee# 
[pod-name-smf-data/podname] cee# config terminal
Entering configuration mode terminal
[pod-name-smf-data/podname] cee(config)# system mode running
[pod-name-smf-data/podname] cee(config)# commit
Commit complete.

等待系統達到100%。

過帳支票

從主裝置驗證Kubernetes。

cloud-user@pod-name-smf-data-master-1:~$ kubectl get pods -A -o wide | egrep 'postgres|pgpool'
All pods should display up and running without any restarts

驗證是否已從CEE清除警報

要驗證警報是否已從CEE清除，請輸入以下命令：

show alerts active summary | include "POD_|k8s-pod-"

此外，可以輸入以下命令以確保有一個主資料庫和兩個備用資料庫：

echo "0----------------------------------";kubectl
 exec -it postgres-0 -n $(kubectl get pods -A | grep postgres | awk '{print $1}' | head -1)
 -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "1--------------------------
--------";kubectl exec -it postgres-1 -n $(kubectl get pods -A | grep postgres | awk '{print $1}'
 | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "2---------------
-------------------"; kubectl exec -it postgres-2 -n $(kubectl get pods -A | grep postgres |
 awk '{print $1}' | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;

預期輸出示例為：

cloud-user@pod-name-smf-data-master-1:~$ echo "0----------------------------------";kubectl
 exec -it postgres-0 -n $(kubectl get pods -A | grep postgres | awk '{print $1}' | head -1)
 -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "1--------------------------
--------";kubectl exec -it postgres-1 -n $(kubectl get pods -A | grep postgres | awk '{print $1}'
 | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "2---------------
-------------------"; kubectl exec -it postgres-2 -n $(kubectl get pods -A | grep postgres |
 awk '{print $1}' | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;
0----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:18] My name is pg-postgres-0
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I'm not a master, nothing else to do!
1----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:19] My name is pg-postgres-1
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I think I'm master. Will ask my neighbors if they agree.
[bin][h][imm] >>> Will ask nodes from PARTNER_NODES list
[bin][h][imm] >>> Checking node pg-postgres-0
[bin][h][imm] >>>>>>>>> Count of references to potential master pg-postgres-1 is 1 now
[bin][h][imm] >>> Checking node pg-postgres-1
[bin][h][imm] >>> Checking node pg-postgres-2
[bin][h][imm] >>>>>>>>> Count of references to potential master pg-postgres-1 is 2 now
[bin][h][imm] >>> Potential masters got references:
[bin][h][imm] >>>>>> Node: pg-postgres-1, references: 2
[bin][h][imm] >>> I have 2/2 incoming reference[s]!
[bin][h][imm] >>>> 2 - Does anyone have more?
[bin][h][imm] >>> Yahoo! I'm real master...so I think!
2----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:21] My name is pg-postgres-2
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I'm not a master, nothing else to do!

修訂記錄

修訂	發佈日期	意見
1.0	25-Jan-2022	初始版本

由思科工程師貢獻

Adithian Arathi
Cisco TAC工程師

這份文件是否有所幫助？

意見

讓思科協助您

開啟支援問題單
(需有思科服務合約)