简介
本文档介绍在IP多媒体子系统(IMS)和数据用户平面功能(UPF)节点中执行维护活动的过程。
先决条件
要求
Cisco 建议您了解以下主题:
- 5G-UPF
- 冗余配置管理器(RCM)
- 虚拟数据包核心(VPC) — 单实例(SI)
- 基于内核的虚拟机(KVM)虚拟机监控程序
使用的组件
本文档中的信息基于以下软件和硬件版本:
- 用户微服务基础设施(SMI)2020.02.2.35
- Star OS 21.22
本文档中的信息都是基于特定实验室环境中的设备编写的。本文档中使用的所有设备最初均采用原始(默认)配置。如果您的网络处于活动状态,请确保您了解所有命令的潜在影响。
背景信息
什么是UPF?
用户平面接口(UPF)是5G核心网络(5GC)的网络功能(NF)之一。 它在5G架构中负责数据包路由和转发、数据包检测、处理QoS和外部PDU会话以互连数据网络(DN)。
什么是VPC-SI?
VPC-SI将运行StarOS的物理Cisco ASR 5500机箱的操作整合到能够在商用现成(COTS)服务器上运行的单个虚拟机(VM)中。每个VPC-SI VM都作为独立的StarOS实例运行,并融入了物理机箱的管理和会话处理功能。
什么是KVM虚拟机监控程序?
基于内核的虚拟机(KVM)是一种内置于Linux的开源虚拟化技术。具体来说,KVM可将Linux转换为虚拟机监控程序,允许主机运行多个隔离的虚拟环境,称为访客或虚拟机(VM)。
什么是ICSR?
机箱间会话恢复(ICSR)是需要单独许可证的思科许可功能,此功能为连续呼叫过程提供最高可用性,而不会中断用户服务。ICSR允许运营商配置网关以实现冗余。在网关发生故障时,ICSR允许围绕故障透明地路由会话,从而保持用户体验。ICSR还会保留会话信息和状态。
问题
硬件维护(如硬件故障或软件/固件升级等)需要服务器停机。要在UPF裸机服务器中执行维护,并且要了解如何正常切换服务,以避免UPF应用中出现不必要的停机时间,必须遵循此步骤。
维护程序
UPF节点是托管在KVM虚拟机监控程序中的StarOS虚拟机。一个KVM虚拟机监控程序托管2个虚拟机实例。IMS UPF有1:1冗余,每个活动实例都有一个备用实例。它使用ICSR和会话冗余协议(SRP)来处理冗余。SRP用于在ICSR机箱之间交换hello消息。它还交换主用/备用机箱(检查点数据)之间的会话状态信息。完整的用户会话信息以呼叫恢复记录(CRR)的形式,通过SRP链路从主用机箱发送到备用机箱。
登录KVM节点并使用KVM virsh命令列出VM实例。
cloud-user@podname-upf-ims-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 imsupf01 running
4 imsupf10 running
cloud-user@podname-upf-ims-kvmnode-1:~$
登录到UPF实例并检查机箱状态。
[local]imsupf10# show srp info
Friday July 22 15:50:24 UTC 2022
Service Redundancy Protocol:
-------------------------------------------------------------------------------
Context: srp
Local Address: 10.x.x.74
Chassis State: Standby
Chassis Mode: Backup
Chassis Priority: 2
Local Tiebreaker: 02-7E-35-53-F9-F1
Route-Modifier: 9
Peer Remote Address: 10.x.x.73
Peer State: Active
Peer Mode: Primary
Peer Priority: 1
Peer Tiebreaker: 02-11-59-73-87-35
Peer Route-Modifier: 8
Last Hello Message received: Fri Jul 22 15:50:21 2022 (3 seconds ago)
Peer Configuration Validation: Complete
Last Peer Configuration Error: None
Last Peer Configuration Event: Fri Jul 22 15:50:22 2022 (2 seconds ago)
Last Validate Switchover Status: None
Connection State: Connected
[local]imsupf01# show srp info
Friday July 22 15:31:20 UTC 2022
Service Redundancy Protocol:
-------------------------------------------------------------------------------
Context: srp
Local Address: 10.x.x.66
Chassis State: Active
Chassis Mode: Backup
Chassis Priority: 2
Local Tiebreaker: 02-7C-1A-62-FA-3C
Route-Modifier: 5
Peer Remote Address: 10.x.x.65
Peer State: Standby
Peer Mode: Primary
Peer Priority: 1
Peer Tiebreaker: 02-87-33-98-6D-08
Peer Route-Modifier: 6
Last Hello Message received: Fri Jul 22 15:31:20 2022 (1 seconds ago)
Peer Configuration Validation: Complete
Last Peer Configuration Error: None
Last Peer Configuration Event: Fri Jul 22 15:20:13 2022 (668 seconds ago)
Last Validate Switchover Status: None
Connection State: Connected
检查IMS UPF的主用 — 备用ICSR对上的线路数量是否相同。
Active node
# show configuration | grep -n -E "^end$"
Thursday July 21 07:30:17 UTC 2022
14960:end
Standby Node
# show configuration | grep -n -E "^end$"
Thursday July 21 07:31:02 UTC 2022
14959:end
在活动UPF上的SRP切换之前,检查SRP会话管理器是否处于活动 — 连接状态,并确保不存在挂起 — 活动状态。
[local]imsupf01# show srp checkpoint statistics active
Thursday July 21 07:38:04 UTC 2022
Number of Sessmgrs: 20
Sessmgrs in Active-Connected state: 20
Sessmgrs in Standby-Connected state: 0
Sessmgrs in Pending-Active state: 0
在备用UPF上的SRP切换之前,检查SRP sessmgr是否处于主用 — 连接状态,并确保不存在挂起 — 主用状态
[local]imsupf02# show srp checkpoint statistics active
Thursday July 21 07:40:03 UTC 2022
Number of Sessmgrs: 20
Sessmgrs in Active-Connected state: 0
Sessmgrs in Standby-Connected state: 20
Sessmgrs in Pending-Active state: 0
如果其中任何一项处于“活动”状态,您需要在切换之前先执行这些任务:
[upf-ims]# save config /flash/xxx_production.cfg. --> Replace xxx with the desired name of the config
[upf-ims]# srp validate-configuration
[upf-ims]# srp validate-switchover
在VM关闭之前,您需要确保活动实例切换到备用状态,以便用户能够正常切换。如果实例已处于备用状态,则无需执行任何操作。如果实例处于活动状态,请检查突出显示的值,并确保备用实例已准备好接管。
检查活动UPF实例中的当前用户。
[local]imsupf01# show subscribers data-rate summary
Friday July 22 16:01:37 UTC 2022
Total Subscribers : 175024
Active : 175024 Dormant : 0
将活动实例切换为备用。
[context-name]<hostname># srp initiate-switchover
检查备用的状态,此时该状态会变为活动状态,并且订户会话也会移至新的活动实例。现在,由于两个VM实例都处于备用状态,因此可以关闭它们进行服务器维护。使用给定virsh命令关闭VM实例并验证状态。
cloud-user@podname-upf-ims-kvmnode-1:~$ sudo virsh shutdown imsupf01
Domain imsupf01 is being shutdown
cloud-user@podname-upf-ims-kvmnode-1:~$ sudo virsh shutdown imsupf10
Domain imsupf10 is being shutdown
cloud-user@podname-upf-ims-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 imsupf01 shut off
4 imsupf10 shut off
cloud-user@podname-upf-ims-kvmnode-1:~$
在维护后恢复服务器后,VM将自动启动。UPF实例保持备用状态。使用给定命令进行验证。
[local]imsupf10# show srp info
Friday July 22 15:50:24 UTC 2022
Service Redundancy Protocol:
-------------------------------------------------------------------------------
Context: srp
Local Address: 10.x.x.74
Chassis State: Standby
Chassis Mode: Backup
Chassis Priority: 2
Local Tiebreaker: 02-7E-35-53-F9-F1
Route-Modifier: 9
Peer Remote Address: 10.x.x.73
Peer State: Active
Peer Mode: Primary
Peer Priority: 1
Peer Tiebreaker: 02-11-59-73-87-35
Peer Route-Modifier: 8
Last Hello Message received: Fri Jul 22 15:50:21 2022 (3 seconds ago)
Peer Configuration Validation: Complete
Last Peer Configuration Error: None
Last Peer Configuration Event: Fri Jul 22 15:50:22 2022 (2 seconds ago)
Last Validate Switchover Status: None
Connection State: Connected
数据UPF使用具有N:M冗余的RCM,其中N是活动UPF的数量且小于10,M是冗余组中的备用UP的数量。RCM是思科专有节点或网络功能(NF),为基于StarOS的用户平面功能(UPF)提供冗余。 它存储或镜像来自所有活动UPF的所有所需会话信息。在切换触发器上,选择一个备用UPF以从公共位置接收适当的会话数据。RCM在虚拟机上的K3集群上运行。运营中心配置RCM节点。
数据UPF节点也与IMS UPF节点相同。唯一的区别是RCM冗余管理。
检查KVM节点中的VM状态。
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 dataupf20 running
2 dataupf11 running
cloud-user@podname-upf-data-kvmnode-1:~$
登录到UPF实例后,检查RCM冗余状态。如果实例已处于备用状态,则无需执行任何操作。如果它处于活动状态,则需要将其平稳地切换到备用状态。
[local]dataupf11# show rcm info
Friday July 22 17:23:17 UTC 2022
Redundancy Configuration Module:
-------------------------------------------------------------------------------
Context: rcm
Bind Address: 10.x.x.75
Chassis State: Active
Session State: SockActive
Route-Modifier: 26
RCM Controller Address: 10.x.x.163
RCM Controller Port: 9200
RCM Controller Connection State: Connected
Ready To Connect: Yes
Management IP Address: 10.x.x.149
Host ID: DATAUPF15
SSH IP Address: 10.x.x.158 (Activated)
SSH IP Installation: Enabled
[local]dataupf11#
检查所有sessmgr是否都处于Active-connected状态。
local]dataupf11# show rcm checkpoint statistics active
Thursday July 21 07:47:03 UTC 2022
Number of Sessmgrs: 22
Sessmgrs in Active-Connected state: 22
Sessmgrs in Standby-Connected state: 0
Sessmgrs in Pending-Active state: 0
从客户信息调查问卷(CIQ)中确定相应的RCM节点,并检查RCM状态。请注意,RCM切换只能从主节点完成。确保登录到主RCM。
[podname-aio-1/dcrm01] rcm# rcm show-status
message :
{"status":"MASTER"}
[podname-aio-1/dcrm01] rcm#
使用给定命令查找活动和备用UPF节点(输出被截断):
[podname-aio-1/dcrm01] rcm# rcm show-statistics controller
message :
{
"keepalive_version": "e7386cb81b1fefc3396dfd1d528e0d2a27de80d5de6a78364caf938a0d2149b6",
"keepalive_timeout": "20s",
"num_groups": 2,
"groups": [
{
"groupid": 1,
"endpoints_configured": 7,
"standby_configured": 1,
"pause_switchover": false,
"active": 6,
"standby": 1,
"endpoints": [
{
"endpoint": "10.x.x.75",
"bfd_status": "STATE_UP",
"upf_registered": true,
"upf_connected": true,
"upf_state_received": "UpfMsgState_Active",
"bfd_state": "BFDState_UP",
"upf_state": "UPFState_Active",
"route_modifier": 26,
"pool_received": true,
"echo_received": 142354,
"management_ip": "10.x.x.149",
"host_id": "DATAUPF15",
"ssh_ip": "10.x.x.158",
"force_nso_registration": false
....
....
{
"endpoint": "10.x.x.77",
"bfd_status": "STATE_UP",
"upf_registered": true,
"upf_connected": true,
"upf_state_received": "UpfMsgState_Standby",
"bfd_state": "BFDState_UP",
"upf_state": "UPFState_Standby",
"route_modifier": 50,
"pool_received": false,
"echo_received": 3673,
"management_ip": "10.x.x.153",
"host_id": "",
"ssh_ip": "10.x.x.186",
"force_nso_registration": false
},
使用管理IP登录到备用UPF实例并验证状态
[local]dataupf13# show rcm info
Friday July 22 17:36:04 UTC 2022
Redundancy Configuration Module:
-------------------------------------------------------------------------------
Context: rcm
Bind Address: 10.x.x.77
Chassis State: Standby
Session State: SockStandby
Route-Modifier: 50
RCM Controller Address: 10.x.x.163
RCM Controller Port: 9200
RCM Controller Connection State: Connected
Ready To Connect: Yes
Management IP Address: 10.x.x.153
Host ID:
SSH IP Address: 10.x.x.186 (Activated)
SSH IP Installation: Enabled
[local]dataupf13#
验证后,将活动状态正常切换到备用状态。确保使用管理IP。
[podname-aio-1/dcrm01] rcm# rcm switchover-mgmt-ip source 10.x.x.149 destination 10.x.x.153
注意:如果新的活动UP会话管理器停滞在SERVER状态,则可在切换后使用。联系思科技术支持。如果有问题的实例,sessmgr必须终止,因此它使用正确的客户端套接字状态重新连接到RCM并恢复。所有sessmgr都需要处于CLIENT状态。使用给定命令(在隐藏模式下)进行验证。
# show session subsystem facility sessmgr all debug-info | grep -E "SessMgr|Mode:"
Thursday July 21 07:56:26 UTC 2022
SessMgr: Instance 5000
Mode: UNKNOWN State: SRP_SESS_STATE_SOCK_ACTIVE
SessMgr Activity Detected: FALSE
SessMgr: Instance 22
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
SessMgr Activity Detected: TRUE
SessMgr: Instance 21
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
SessMgr Activity Detected: TRUE
检查所有sessmgr是否处于活动和就绪状态。
# show rcm checkpoint statistics verbose
Thursday July 21 07:52:29 UTC 2022
smgr state peer recovery pre-alloc chk-point rcvd chk-point sent
inst conn records calls full micro full micro
---- ------- ----- ------- -------- ----- ----- ----- ----
1 Actv Ready 0 0 1731 68120 3107912 409200665
2 Actv Ready 0 0 1794 70019 3060062 408647685
3 Actv Ready 0 0 1753 68793 3078531 406227415
4 Actv Ready 0 0 1744 67585 3080952 410218643
5 Actv Ready 0 0 1749 69155 3096067 404944553
6 Actv Ready 0 0 1741 68805 3067392 407133464
7 Actv Ready 0 0 1744 67963 3084023 406772101
8 Actv Ready 0 0 1748 68702 3009558 408073589
9 Actv Ready 0 0 1736 68169 3030624 405679108
10 Actv Ready 0 0 1707 67386 3071592 406000628
11 Actv Ready 0 0 1738 68086 3052899 407991476
12 Actv Ready 0 0 1720 68500 3102045 408803079
13 Actv Ready 0 0 1772 69683 3082235 406426650
14 Actv Ready 0 0 1727 66900 2873736 392352402
15 Actv Ready 0 0 1739 68465 3032395 409603844
16 Actv Ready 0 0 1756 69221 3063447 411445527
17 Actv Ready 0 0 1755 68708 3051573 406333047
18 Actv Ready 0 0 1698 66328 3066983 407320405
19 Actv Ready 0 0 1736 68030 3037073 408215965
20 Actv Ready 0 0 1733 67873 3069116 405634816
21 Actv Ready 0 0 1763 69259 3074942 409802455
22 Actv Ready 0 0 1748 68228 3051222 406470380
验证订户是否移至新的备用设备:
[local]dataupf11# show subscribers data-rate summary
Friday July 22 17:40:18 UTC 2022
Total Subscribers : 62259
Active : 62259 Dormant : 0
当两个实例都处于备用状态时,可以使用virsh命令从KVM关闭VM。
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh shutdown dataupf20
Domain dataupf20 is being shutdown
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh shutdown dataupf11
Domain dataupf11 is being shutdown
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 dataupf20 shut off
4 dataupf11 shut off
cloud-user@podname-upf-data-kvmnode-1:~$
当VM关闭时,可以关闭KVM节点(物理服务器)进行维护。完成后,启动服务器。VM自动启动。UPF实例自行变为备用。使用给定命令检验是否相同。
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 dataupf20 running
2 dataupf11 running
cloud-user@podname-upf-data-kvmnode-1:~$
[local]dataupf11# show rcm info
Friday July 22 17:36:04 UTC 2022
Redundancy Configuration Module:
-------------------------------------------------------------------------------
Context: rcm
Bind Address: 10.x.x.77
Chassis State: Standby
Session State: SockStandby
Route-Modifier: 50
RCM Controller Address: 10.x.x.163
RCM Controller Port: 9200
RCM Controller Connection State: Connected
Ready To Connect: Yes
Management IP Address: 10.x.x.153
Host ID:
SSH IP Address: 10.x.x.186 (Activated)
SSH IP Installation: Enabled
[local]dataupf13#
在RCM节点中,rcm控制器仍然可以将备用UPF显示为“待定备用”。 转换到备用状态最多可能需要15到20分钟。使用给定命令检验相同情况(输出被截断):
[podname-aio-1/dcrm01] rcm# rcm show-statistics controller
message :
{
"keepalive_version": "e7386cb81b1fefc3396dfd1d528e0d2a27de80d5de6a78364caf938a0d2149b6",
"keepalive_timeout": "20s",
"num_groups": 2,
"groups": [
{
"groupid": 1,
"endpoints_configured": 7,
"standby_configured": 1,
"pause_switchover": false,
"active": 6,
"standby": 1,
"endpoints": [
....
....
{
"endpoint": "10.x.x.77",
"bfd_status": "STATE_UP",
"upf_registered": true,
"upf_connected": true,
"upf_state_received": "UpfMsgState_Standby",
"bfd_state": "BFDState_UP",
"upf_state": "UPFState_Standby",
"route_modifier": 50,
"pool_received": false,
"echo_received": 3673,
"management_ip": "10.x.x.153",
"host_id": "",
"ssh_ip": "10.x.x.186",
"force_nso_registration": false
},