对UPs中停滞在服务器模式下的Sessmgr进行故障排除

下载选项

PDF (566.7 KB)
在各种设备上使用 Adobe Reader 查看

已更新: 2023 年 9 月 13 日

文档 ID:220882

非歧视性语言

此产品的文档集力求使用非歧视性语言。在本文档集中，非歧视性语言是指不隐含针对年龄、残障、性别、种族身份、族群身份、性取向、社会经济地位和交叉性的歧视的语言。由于产品软件的用户界面中使用的硬编码语言、基于 RFP 文档使用的语言或引用的第三方产品使用的语言，文档中可能无法确保完全使用非歧视性语言。深入了解思科如何使用包容性语言。

关于此翻译

思科采用人工翻译与机器翻译相结合的方式将此文档翻译成不同语言，希望全球的用户都能通过各自的语言得到支持性的内容。请注意：即使是最好的机器翻译，其准确度也不及专业翻译人员的水平。 Cisco Systems, Inc. 对于翻译的准确性不承担任何责任，并建议您总是参考英文原始文档（已提供链接）。

简介

本文档介绍导致sessmgr服务器状态的冗余配置管理器(RCM)和用户平面功能(UPF)问题。

先决条件

要求

Cisco 建议您了解以下主题：

使用的组件

本文档中的信息基于以下软件和硬件版本：

RCM-checkpointmgr
UPF-sessmgr

本文档中的信息都是基于特定实验室环境中的设备编写的。本文档中使用的所有设备最初均采用原始（默认）配置。如果您的网络处于活动状态，请确保您了解所有命令的潜在影响。

背景信息

它还提供了有关sessmgr服务器状态问题、阻碍流量和呼叫处理的详细故障排除指南。另外，还要在实验室测试部分进行恢复。

基础概述

RCM and UP Connectivity

如图所示，您可以观察RCM中的冗余管理器（称为checkpointmgr）与UPF中的sessmgr之间的直接连接，以进行检查点跟踪。

Redmgrs和Sessmgrs映射

1.每个UP有一个“N”个sessmgr。

2. RCM的redmgr数量为“M”，具体取决于UPF中的sessmgr数量。

3. redmgr和sessmgr都基于其ID进行1:1映射，其中每个sessmgr都有单独的redgrs。

redmgr and sessmgr Mapping

Note :: Redmgr IDs (m) = sessmgr instance ID (n-1)

For example :: smgr-1 is mapped with redmgr 0;smgr-2 is mapped with redmgr-1,

smgr-n is mapped with redmgr(m) = (n-1)

This is important to understand proper IDs of redmgr because we need to have proper logs to be checked

需要日志

RCM日志 — 命令输出：

rcm show-statistics checkpointmgr-endpointstats

RCM controller and checkpointmgr logs (refer this link)

Log collection

UPF:

Command outputs (hidden mode)

show rcm checkpoint statistics verbose
show session subsystem facility sessmgr all debug-info | grep Mode

If you see any sessmgr in server state check the sessmgr instance IDs and no of sessmgr

show task resources facility sessmgr all

故障排除

通常，UPF中有21个sessmgr实例，包括20个活动sessmgr和1个备用实例（尽管此计数可能因具体设计而异）。

示例：

要识别非活动的活动会话，可以使用以下命令：

show task resources facility sessmgr all

在这种情况下，尝试通过重新启动有问题的sessmgr甚至重新启动sessctrl来解决问题不会导致恢复受影响的sessmgr。
此外，可以观察到受影响会话停滞在服务器模式而不是预期的客户端模式，可以使用提供的命令验证这种情况。

show rcm checkpoint statistics verbose

show rcm checkpoint statistics verbose 
Tuesday August 29 16:27:53 IST 2023
smgr state peer recovery pre-alloc chk-point rcvd chk-point sent
inst conn records calls full micro full micro
---- ------- ----- ------- -------- ----- ----- ----- ----
1 Actv Ready 0 0 0 0 61784891 1041542505
2 Actv Ready 0 0 0 0 61593942 1047914230
3 Actv Ready 0 0 0 0 61471304 1031512458
4 Actv Ready 0 0 0 0 57745529 343772730
5 Actv Ready 0 0 0 0 57665041 356249384
6 Actv Ready 0 0 0 0 57722829 353213059
7 Actv Ready 0 0 0 0 61992022 1044821794
8 Actv Ready 0 0 0 0 61463665 1043128178

Here in above command all the connection can be seen as Actv Ready state which is required 

show session subsystem facility sessmgr all debug-info | grep Mode

[local]
    
    
      # show session subsystem facility sessmgr all debug-info | grep Mode 
    
Tuesday August 29 16:28:56 IST 2023
Mode: UNKNOWN State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE

此处，所有sessmgrs最好都处于客户端模式。但是，在此问题中，它们处于服务器模式，这会阻止它们处理流量。

Sessmgr进入服务器模式

为了便于通信和传输检查点，每个会话管理器(sessmgr)与相应的冗余管理器(redmgr)建立TCP对等连接。
建立TCP对等连接后，redmgr可以从sessmgr检查所有用户情景并保存它们。这允许无缝切换，因为检查点可以与其各自的sessmgr实例一起转移到其他用户平面功能(UPF)。
对sessmgr而言，始终处于CLIENT模式至关重要。如果出于任何原因在服务器模式下检测到sessmgr，则表明与关联的redmgr的TCP对等连接已断开。在这种情况下，不会发生检查点。
当UPF中的SESSMGR处于此状态时，在不考虑SESSMGR的状态的情况下执行计划外切换到另一UPF会导致相同的问题。在这种情况下，sessmgr无法处理流量。

注意：在某些问题中，checkpointmgr自身正在等待RCM已启动检查点的检查点，并等待从UPF返回响应。但是，当没有响应checkpointmgr时，它自身无法通信，这会导致切换过程的完成延迟超过切换计时器值。因此，在这种情况下，UP甚至会陷入到PendActive状态。

可以在RCM统计信息和redmgr日志中检查此项。此外，使用此命令，您可以知道哪个checkpointmgr与哪个UPF有问题。

rcm show-statistics checkpointmgr-endpointstats

4. sessmgr在本地进入服务器模式可能有多种原因，但其中一个主要原因如下所述。

Sessmgr进入服务器模式的原因

1.根据用户平面功能(UPF)中的会话管理器数量，为冗余管理器(redmgr)创建副本，并在资源控制管理器(RCM)中进行配置。此配置确保每个redmgr都与一个会话管理器实例连接。

2.如果redmgr和sessmgr之间存在1:1的映射，当会话管理器实例ID超过大于会话管理器数量的值时，会发生什么情况？

For example ::: 

Sessmgr instance ID :: 1 to 20
Redmgr IDs :: 0 to 19

In this example somehow if my sessmgr instance ID goes beyond the mentioned limit i.e say 21/22/23/24/25 so in this case redmgr is already mapped with instance IDs 0 to 19 and would be unaware about this new sessmgr instance ID created by UPF from 21 to 25 and in such a  case sessmgr with this instance IDs :: 21/22/23/24/25 will not be able to form any TCP peer connection with RCM redmgr leading to no checkpoint sync and since there won’t be any checkpoint sync sessmgr will get stuck into server mode and won’t take any traffic.
Refer this diagram

Both this sessmgr instance-7/8 have no TCP peer connection since for RCM redmgr-1 was 
connected with instance-2 and redmgr-2 was connected to instance-5 so even though sessmgr 
came up with new instance ID value which is beyond defined limit it wont have connection 
back with redmgrs which is still just pointing to previous instance but connection is broken

解决方法

此问题的解决方案是限制sessmgr实例ID的数量，以匹配UPF中的sessmgr数量和RCM中的redmgr数量，如上述命令所指定。

Max value of sessmgr instance ID = no of checkpointmgr – 1

根据此逻辑，需要定义sessmgr的数量，包括备用sessmgr。

task facility sessmgr max <no of max sessmgrs>

Note :: Implementation of this command needs node reload to enable full functionality of this command

通过执行此命令，无论sessmgr被终止的次数如何，它始终会得到一个实例ID值，该实例ID值等于或小于sessmgr的最大计数。这有助于防止RCM出现检查点问题，并防止sessmgr因此进入服务器模式。

修订历史记录

版本	发布日期	备注
1.0	13-Sep-2023	初始版本

由思科工程师提供

巴拉蒂·乔杜里
思科TAC工程师
克里希纳·基肖尔D·V
思科技术领导者

此文档是否有帮助?

反馈

联系我们

提交支持案例
(需要思科服务合同)

本文档适用于以下产品

Ultra Gateway Platform with CUPS