簡介
本文描述如何實施使用者微服務基礎設施(SMI)通用執行環境(CEE)裝置池(POD)(pgpool)重新啟動問題的解決方法。
必要條件
需求
思科建議您瞭解以下主題:
- Cisco SMI CEE(Ultra Cloud Core CEE)
- 5G雲原生部署平台(CNDP)或SMI裸機(BM)架構
- 多克和庫伯內特
採用元件
本文中的資訊係根據以下軟體和硬體版本:
- SMI 2020.02.2.35
- Kubernetes v1.21.0
本文中的資訊是根據特定實驗室環境內的裝置所建立。文中使用到的所有裝置皆從已清除(預設)的組態來啟動。如果您的網路運作中,請確保您瞭解任何指令可能造成的影響。
背景資訊
什麼是SMI?
Cisco SMI是雲技術和標準的分層堆疊,支援來自思科移動性、電纜和寬頻網路網關(BNG)業務部門的基於微服務的應用 — 所有這些業務部門都具有相似的使用者管理功能和相似的資料儲存要求。
屬性包括:
- 層雲堆疊(技術和標準),提供自上而下的部署,並適應當前的客戶雲基礎設施。
- CEE由所有應用程式共用,用於非應用程式功能(資料儲存、部署、配置、遙測和警報)。 這為所有客戶接觸點和整合點提供一致的互動和體驗。
- 應用和CEE部署在微服務容器中,並與智慧服務網狀網連線。
- 公開的API用於部署、配置和管理,以實現自動化。
什麼是SMI CEE?
CEE是為監控部署在SMI上的移動和電纜應用而開發的軟體解決方案。CEE以集中方式從應用程式捕獲資訊(關鍵指標),以便工程師進行調試和故障排除。
CEE是為所有應用程式安裝的常用工具集。它配備了專用的運營中心,可提供用於管理監控工具的使用者介面(CLI)和API。每個集群只能有一個CEE。
什麼是CEE POD?
POD是在您的Kubernetes群集上運行的進程。POD封裝了一個稱為容器的粒狀單元。POD包含一個或多個容器。
Kubernetes在單個節點(可以是物理或虛擬機器)上部署一個或多個POD。每個POD具有獨立的身份,具有內部IP地址和埠空間。但是,POD中的容器可以共用儲存和網路資源。CEE擁有許多具有獨特功能的POD。Pgpool和postgress是多個CEE POD之一。
什麼是pgpool POD?
Pgpool管理用於連線、複製、負載平衡等的Postgres資源池。Pgpool是在PostgreSQL伺服器和PostgreSQL資料庫之間工作的中介軟體。
什麼是Postgres POD?
Postgres支援結構化查詢語言(SQL)資料庫,該資料庫具有儲存警報和Grafana儀表板的冗餘。
問題
pgpool POD會定期重新啟動,而postgresql POD則不會出現任何問題。
要顯示警報,請輸入以下命令:
show alerts active summary | include "POD_|k8s-pod-"
此處顯示了CEE的警報示例。
[pod-name-smf-data/podname] cee# show alerts active summary | include "POD_|k8s-pod-"
k8s-pod-crashing-loop 1d9d2b113073 critical 12-15T21:47:39 pod-name-smf-data-mas
Pod cee-podname/grafana-65cbdb9846-krgfq (grafana) is restarting 1.03 times / 5 minutes.
POD_Restarted 04d42efb81de major 12-15T21:45:44 pgpool-67f48f6565-vjt Container=
k8s_pgpool_pgpool-67f48f6565-vjttd_cee-podname_a9f68607-eac4-40a9-86ef-db8176e0a22a_1474 of pod= pgpool-...
POD_Restarted f7657a0505c2 major 12-15T21:45:44 postgres-0 Container=
k8s_postgres_postgres-0_cee-podname_59e0a768-6870-4550-8db3-32e2ab047ce2_1385 of pod= postgres-0 in name...
POD_Restarted 6e57ae945677 major 12-15T21:45:44 alert-logger-d96644d4 Container=
k8s_alert-logger_alert-logger-d96644d4-dsc8h_cee-podname_2143c464-068a-418e-b5dd-ce1075b9360e_2421 of po...
k8s-pod-crashing-loop 5b8e6a207aad critical 12-15T21:45:09 pod-name-smf-data-mas Pod
cee-podname/pgpool-67f48f6565-vjttd (pgpool) is restarting 1.03 times / 5 minutes.
POD_Down 45a6b9bf73dc major 12-15T20:30:44 pgpool-67f48f6565-qbw Pod= pgpool-67f48f6565-qbw52 in namespace=
cee-podname is DOWN for more than 15min
POD_Down 4857f398a0ca major 12-15T16:40:44 pgpool-67f48f6565-vjt Pod= pgpool-67f48f6565-vjttd in namespace=
cee-podname is DOWN for more than 15min
k8s-pod-not-ready fc65254c2639 critical 12-11T21:07:29 pgpool-67f48f6565-qbw Pod
cee-podname/pgpool-67f48f6565-qbw52 has been in a non-ready state for longer than 1 minute.
k8s-pod-not-ready 008b859e7333 critical 12-11T16:35:49 pgpool-67f48f6565-vjt Pod
cee-podname/pgpool-67f48f6565-vjttd has been in a non-ready state for longer than 1 minute.
疑難排解
在Kubernetes主目錄中,輸入以下命令:
kubectl describe pods -n
postgres-0"
此處顯示了POD描述的示例輸出。輸出被截斷。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned cee-pod-name-l1/postgres-2
to pod-name-master-3
Normal Pulling 14m kubelet Pulling image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
Normal Pulled 13m kubelet Successfully pulled image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d" in 29.048094722s
Warning Unhealthy 12m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:09:48]
pod is not ready
Warning Unhealthy 10m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:11:18]
pod is not ready
Warning Unhealthy 10m kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:11:48]
pod is not ready
Warning Unhealthy 9m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:12:18]
pod is not ready
Warning Unhealthy 9m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:12:48]
pod is not ready
Warning Unhealthy 8m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:13:18]
pod is not ready
Warning Unhealthy 8m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:13:48]
pod is not ready
Warning Unhealthy 7m49s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:14:18]
pod is not ready
Warning Unhealthy 7m19s kubelet Readiness probe failed: [bin][h][ir] >>> [2021-10-11 18:14:48]
pod is not ready
Warning BackOff 6m44s kubelet Back-off restarting failed container
或
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m default-scheduler 0/5 nodes are available: 2 node(s)
didn't match Pod's node affinity/selector, 3 node(s) didn't find available persistent
volumes to bind.
Normal Scheduled 13m default-scheduler Successfully assigned cee-pod-name-l1/postgres-0
to pod-name-master-1
Warning FailedScheduling 13m default-scheduler 0/5 nodes are available: 2 node(s)
didn't match Pod's node affinity/selector, 3 node(s) didn't find available
persistent volumes to bind.
Normal Pulling 13m kubelet Pulling image "docker.10.192.x.x.nip.io/cee-2020.02.2.i38/
smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
Normal Pulled 12m kubelet Successfully pulled image "docker.10.192.x.x.nip.io/
cee-2020.02.2.i38/smi-libraries/postgresql/2020.02.2/postgres:1.3.0-946d87d"
in 43.011763302s
Warning Unhealthy 7m20s kubelet Liveness probe failed: [bin][h][imm] >>>
[2021-10-11 18:09:16] My name is pg-postgres-0
因應措施
附註:此過程不會導致應用程式發生任何停機。
關閉CEE
要關閉CEE,請從CEE輸入以下命令:
[pod-name-smf-data/podname] cee#
[pod-name-smf-data/podname] cee# config terminal
Entering configuration mode terminal
[pod-name-smf-data/podname] cee(config)# system mode shutdown
[pod-name-smf-data/podname] cee(config)# commit
Commit complete.
等待系統達到100%
從資料夾中刪除內容
從主VIP通過SSH連線到每個主VM並刪除這些資料夾的內容:/data/cee-podname/data-postgres-[0-2]。
Master 1
cloud-user@pod-name-smf-data-master-1:~$ sudo rm -rf /data/cee-podname/data-postgres-0
Master 2
cloud-user@pod-name-smf-data-master-2:~$ sudo rm -rf /data/cee-podname/data-postgres-1
Master 3
cloud-user@pod-name-smf-data-master-3:~$ sudo rm -rf /data/cee-podname/data-postgres-2
恢復CEE
要恢復CEE,請從CEE輸入以下命令:
[pod-name-smf-data/podname] cee#
[pod-name-smf-data/podname] cee# config terminal
Entering configuration mode terminal
[pod-name-smf-data/podname] cee(config)# system mode running
[pod-name-smf-data/podname] cee(config)# commit
Commit complete.
等待系統達到100%。
過帳支票
從主裝置驗證Kubernetes。
cloud-user@pod-name-smf-data-master-1:~$ kubectl get pods -A -o wide | egrep 'postgres|pgpool'
All pods should display up and running without any restarts
驗證是否已從CEE清除警報
要驗證警報是否已從CEE清除,請輸入以下命令:
show alerts active summary | include "POD_|k8s-pod-"
此外,可以輸入以下命令以確保有一個主資料庫和兩個備用資料庫:
echo "0----------------------------------";kubectl
exec -it postgres-0 -n $(kubectl get pods -A | grep postgres | awk '{print $1}' | head -1)
-- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "1--------------------------
--------";kubectl exec -it postgres-1 -n $(kubectl get pods -A | grep postgres | awk '{print $1}'
| head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "2---------------
-------------------"; kubectl exec -it postgres-2 -n $(kubectl get pods -A | grep postgres |
awk '{print $1}' | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;
預期輸出示例為:
cloud-user@pod-name-smf-data-master-1:~$ echo "0----------------------------------";kubectl
exec -it postgres-0 -n $(kubectl get pods -A | grep postgres | awk '{print $1}' | head -1)
-- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "1--------------------------
--------";kubectl exec -it postgres-1 -n $(kubectl get pods -A | grep postgres | awk '{print $1}'
| head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;echo "2---------------
-------------------"; kubectl exec -it postgres-2 -n $(kubectl get pods -A | grep postgres |
awk '{print $1}' | head -1) -- /usr/local/bin/cluster/healthcheck/is_major_master.sh;
0----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:18] My name is pg-postgres-0
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I'm not a master, nothing else to do!
1----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:19] My name is pg-postgres-1
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I think I'm master. Will ask my neighbors if they agree.
[bin][h][imm] >>> Will ask nodes from PARTNER_NODES list
[bin][h][imm] >>> Checking node pg-postgres-0
[bin][h][imm] >>>>>>>>> Count of references to potential master pg-postgres-1 is 1 now
[bin][h][imm] >>> Checking node pg-postgres-1
[bin][h][imm] >>> Checking node pg-postgres-2
[bin][h][imm] >>>>>>>>> Count of references to potential master pg-postgres-1 is 2 now
[bin][h][imm] >>> Potential masters got references:
[bin][h][imm] >>>>>> Node: pg-postgres-1, references: 2
[bin][h][imm] >>> I have 2/2 incoming reference[s]!
[bin][h][imm] >>>> 2 - Does anyone have more?
[bin][h][imm] >>> Yahoo! I'm real master...so I think!
2----------------------------------
[bin][h][imm] >>> [2021-12-15 22:05:21] My name is pg-postgres-2
[bin][h][imm] >>> My state is good.
[bin][h][imm] >>> I'm not a master, nothing else to do!