排除UPs中伺服器模式中停滯的Sessmgr故障

下載選項

PDF (586.6 KB)
在多種裝置上使用 Adobe Reader 檢視
ePub (282.2 KB)
在 iPhone、iPad、Android、Sony Reader 或 Windows Phone 上的各種應用程式中檢視
Mobi (Kindle) (230.6 KB)
在 Kindle 裝置或多部裝置的 Kindle 應用程式上檢視

已更新: 2023 年 9 月 13 日

文件 ID:220882

無偏見用語

本產品的文件集力求使用無偏見用語。針對本文件集的目的，無偏見係定義為未根據年齡、身心障礙、性別、種族身分、民族身分、性別傾向、社會經濟地位及交織性表示歧視的用語。由於本產品軟體使用者介面中硬式編碼的語言、根據 RFP 文件使用的語言，或引用第三方產品的語言，因此本文件中可能會出現例外狀況。深入瞭解思科如何使用包容性用語。

關於此翻譯

思科已使用電腦和人工技術翻譯本文件，讓全世界的使用者能夠以自己的語言理解支援內容。請注意，即使是最佳機器翻譯，也不如專業譯者翻譯的內容準確。Cisco Systems, Inc. 對這些翻譯的準確度概不負責，並建議一律查看原始英文文件（提供連結）。

簡介

本檔案將說明導致sessmgr伺服器狀態的備援組態管理員(RCM)和使用者平面功能(UPF)問題。

必要條件

需求

思科建議您瞭解以下主題：

採用元件

本文中的資訊係根據以下軟體和硬體版本：

RCM-checkpointmgr
UPF-sessmgr

本文中的資訊是根據特定實驗室環境內的裝置所建立。文中使用到的所有裝置皆從已清除（預設）的組態來啟動。如果您的網路運作中，請確保您瞭解任何指令可能造成的影響。

背景資訊

此外，還提供有關sessmgr伺服器狀態問題、阻礙流量和呼叫處理的詳細故障排除指南。另外，還有實驗室恢復測試部分。

基本知識概述

RCM and UP Connectivity

如圖所示，您可以觀察RCM中的冗餘管理器（稱為checkpointmgr）與UPF中的會話之間的直接連線，以便進行檢查點跟蹤。

Redmgrs和Sessmgrs對應

1.每個UP有一個「N」個sessmgr。

2. RCM的redgrs數量為「M」，具體取決於UPF中的sessmgrs數量。

3. redmgr和sessmgr都基於其ID進行1:1對映，其中每個sessmgr具有單獨的redgrs。

redmgr and sessmgr Mapping

Note :: Redmgr IDs (m) = sessmgr instance ID (n-1)

For example :: smgr-1 is mapped with redmgr 0;smgr-2 is mapped with redmgr-1,

smgr-n is mapped with redmgr(m) = (n-1)

This is important to understand proper IDs of redmgr because we need to have proper logs to be checked

需要日誌

RCM Logs — 命令輸出：

rcm show-statistics checkpointmgr-endpointstats

RCM controller and checkpointmgr logs (refer this link)

Log collection

UPF:

Command outputs (hidden mode)

show rcm checkpoint statistics verbose
show session subsystem facility sessmgr all debug-info | grep Mode

If you see any sessmgr in server state check the sessmgr instance IDs and no of sessmgr

show task resources facility sessmgr all

疑難排解

通常，UPF中有21個sessmgr例項，包括20個活動會話和1個備用例項（儘管此計數可能因特定設計而異）。

範例：

要標識非活動的活動會話，可以使用以下命令：

show task resources facility sessmgr all

在這種情況下，嘗試通過重新啟動有問題的sessmgr甚至重新啟動sessctrl來解決問題不會導致恢復受影響的sessmgr。
此外，還觀察到受影響的會話停滯在伺服器模式而不是預期的客戶端模式，可以使用提供的命令來驗證該情況。

show rcm checkpoint statistics verbose

show rcm checkpoint statistics verbose 
Tuesday August 29 16:27:53 IST 2023
smgr state peer recovery pre-alloc chk-point rcvd chk-point sent
inst conn records calls full micro full micro
---- ------- ----- ------- -------- ----- ----- ----- ----
1 Actv Ready 0 0 0 0 61784891 1041542505
2 Actv Ready 0 0 0 0 61593942 1047914230
3 Actv Ready 0 0 0 0 61471304 1031512458
4 Actv Ready 0 0 0 0 57745529 343772730
5 Actv Ready 0 0 0 0 57665041 356249384
6 Actv Ready 0 0 0 0 57722829 353213059
7 Actv Ready 0 0 0 0 61992022 1044821794
8 Actv Ready 0 0 0 0 61463665 1043128178

Here in above command all the connection can be seen as Actv Ready state which is required 

show session subsystem facility sessmgr all debug-info | grep Mode

[local]
    
    
      # show session subsystem facility sessmgr all debug-info | grep Mode 
    
Tuesday August 29 16:28:56 IST 2023
Mode: UNKNOWN State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE

這裡，所有會話最好處於客戶端模式。但是在此問題中，它們處於伺服器模式，這阻止了它們處理流量。

Sessmgr進入伺服器模式

為了便於通訊和傳輸檢查點，每個會話管理器(sessmgr)與相應的冗餘管理器(redmgr)建立TCP對等連線。
建立TCP對等連線後，redmgr可以從sessmgr檢查所有使用者上下文並儲存它們。這允許無縫切換，因為可以使用其各自的sessmgr例項將檢查點轉移到其他使用者平面函式(UPF)。
對sessmgr而言，始終處於客戶端模式至關重要。如果由於任何原因，在伺服器模式下檢測到sessmgr，則表示與關聯的redmgr的TCP對等連線已斷開。在這種情況下，檢查點不會發生。
當UPF中的Sessmgr處於此狀態時，如果不考慮Sessmgr的狀態，執行到另一個UPF的意外切換會導致相同的問題。在這種情況下，sessmgr無法處理流量。

附註：在某些問題中，checkpointmgr本身正在等待RCM已啟動檢查點的檢查點並等待從UPF返迴響應。但是，如果沒有響應checkpointmgr，則其自身無法通訊，從而導致切換過程的完成出現延遲，從而跨越切換計時器值。因此，在這種情況下，UP甚至會陷入到PendActive狀態。

可以在RCM統計資訊和redmgr日誌中檢查此項。此外，使用此命令，您可以知道哪個checkpointmgr與哪個UPF存在問題。

rcm show-statistics checkpointmgr-endpointstats

4. sessmgr在本地進入伺服器模式可能有多種原因，但其中一個主要原因如下所述。

Sessmgr進入伺服器模式的原因

1.根據使用者平面功能(UPF)中的會話管理器數量，將為冗餘管理器(redmgr)建立副本，並在資源控制管理器(RCM)中配置。此配置確保每個redmgr都與一個會話管理器例項連線。

2.如果redmgr和sessmgr之間存在1:1對映，當會話管理器例項ID超過大於會話管理器數目的值時，會發生什麼情況？

For example ::: 

Sessmgr instance ID :: 1 to 20
Redmgr IDs :: 0 to 19

In this example somehow if my sessmgr instance ID goes beyond the mentioned limit i.e say 21/22/23/24/25 so in this case redmgr is already mapped with instance IDs 0 to 19 and would be unaware about this new sessmgr instance ID created by UPF from 21 to 25 and in such a  case sessmgr with this instance IDs :: 21/22/23/24/25 will not be able to form any TCP peer connection with RCM redmgr leading to no checkpoint sync and since there won’t be any checkpoint sync sessmgr will get stuck into server mode and won’t take any traffic.
Refer this diagram

Both this sessmgr instance-7/8 have no TCP peer connection since for RCM redmgr-1 was 
connected with instance-2 and redmgr-2 was connected to instance-5 so even though sessmgr 
came up with new instance ID value which is beyond defined limit it wont have connection 
back with redmgrs which is still just pointing to previous instance but connection is broken

因應措施

此問題的解決方案是限制sessmgr例項ID的數量，以匹配上述命令所指定的UPF中的sessmgr數量和RCM中的redmgr數量。

Max value of sessmgr instance ID = no of checkpointmgr – 1

根據此邏輯，需要定義會話數，包括備用會話。

task facility sessmgr max <no of max sessmgrs>

Note :: Implementation of this command needs node reload to enable full functionality of this command

通過執行此命令，無論sessmgr被終止的次數如何，它始終會得到一個等於或小於sessmgr最大計數的例項ID值。這有助於防止RCM出現檢查點問題，並阻止sessmgr由於此原因進入伺服器模式。

修訂記錄

修訂	發佈日期	意見
1.0	13-Sep-2023	初始版本

由思科工程師貢獻

Bharati Choudhary
Cisco TAC Engineer
Krishna Kishore D V
Cisco Technical Leader

這份文件是否有所幫助？

意見

讓思科協助您

開啟支援問題單
(需有思科服務合約)

本文件適用於這些產品

Ultra Gateway Platform with CUPS