How it Works

The 5G-UPF deployment leverages the ICSR framework infrastructure for checkpointing and switchover of the UPF node as shown in the following figure. The Active UPF communicates to its dedicated Standby UPF through the Service Redundancy Protocol (SRP) link that is provisioned between the UPFs.

UPF 1:1 Redundancy Using SRP

The Session Management Function (SMF) node does not have the Standby UPF information that is available in the UPF group configuration. Therefore, the SMF is not aware of the UPF redundancy configuration and the switchover event among the UPFs.

The Active UPF communicates to the SMF through the N4 interface address configured in the UPF. The Standby UPF takes over the same N4 address when it transitions to Active during the switchover event. This implies that the N4 interface is SRP-activated and is in line with the existing configuration method, therefore UPF switchover is transparent to the SMF.

UPF 1:1 Redundancy Switchover

To make redundancy fully compliant, it addresses the following dependencies on the SRP-based ICSR in the 5G environment.

  • Configuration Synchronization (or, Replica Configuration on Standby UPF)

  • N4 Association Checkpoint

  • N4 Link Monitoring

Besides the dependencies listed, the UPF implements data collection and checkpoint procedures specific to the UPF node. For example, checkpointing for IP-pool chunks. The UPF integrates these procedures into the existing ICSR checkpointing framework.

Independent Configuration of Standby UPF

After UPF is up with base configuration (for example, services, contexts, interfaces, and so on), the rest of the configuration (for example, ACS and policy-related configuration) is done through Ops-center/Redundancy and Configuration Manager (RCM) POD. This configuration is common for both SMF/UPF policies. For SRP redundancy to work, the Active and Standby UPF has same configuration, except SRP-related configuration with which SRP connections are established between Active and Standby UPF. The RCM configures Active and Standby UPF independently.

BFD Monitor Between Active UP and Standby UP

The Bidirectional Forwarding Detection (BFD) monitors the SRP link between the Active UPF and Standby UPF for a fast failure-detection and switchover. When the Standby UPF detects a BFD failure in this link, it takes over as the Active UPF.

The BFD link can be single-hop or multi-hop.

To configure the BFD monitor, between the Active UP and Standby UP, see Configuring BFD Monitoring Between Active UPF and Standby UPF.

Sample Configuration for Multihop BFD Monitoring

Primary UPF:

config
  context srp
    bfd-protocol
      bfd multihop-peer 1.1.1.1 interval 50 min_rx 50 multiplier 20
    #exit
    service-redundancy-protocol
      monitor bfd context srp 1.1.1.1 chassis-to-chassis
      peer-ip-address 1.1.1.1
      bind address 1.1.0.1
    #exit
    interface srp
      ip address 1.1.0.1 2.3.4.0
    #exit
    ip route static multihop bfd bfd1 1.1.0.1 1.1.1.1
    ip route 1.1.0.1 2.3.4.0 1.1.0.1 srp
  #exit
end

Backup UPF:

config
  context srp
    bfd-protocol
      bfd multihop-peer 1.1.0.1 interval 50 min_rx 50 multiplier 20
    #exit
    service-redundancy-protocol
      monitor bfd context srp 1.1.0.1 chassis-to-chassis
      peer-ip-address 1.1.0.1
      bind address 1.1.0.1
    #exit
    interface srp
      ip address 1.1.0.1 255.255.255.0
    #exit
    ip route static multihop bfd bfd1 1.1.1.1 1.1.0.1 
    ip route 1.1.0.1 255.255.255.0 1.1.1.1 srp
  #exit
End
 

Router between Primary and Backup UPF:

config
  context one
    interface one
      ip address 1.1.0.1 255.255.255.0
    #exit
    interface two
      ip address 1.1.1.1 255.255.255.0
    #exit
  #exit
end
 

Sample Configuration for Single-Hop BFD Monitoring

Primary UPF:

config
  context srp
    bfd-protocol
    #exit
    service-redundancy-protocol
      monitor bfd context srp 1.1.0.1 chassis-to-chassis
      peer-ip-address 1.1.0.1
      bind address 1.1.2.1
    #exit
    interface srp
      ip address 1.1.0.1 255.255.255.0
      bfd interval 50 min_rx 50 multiplier 10
    #exit
    ip route static bfd srp 1.1.2.1
  #exit
end
 

Backup UPF:

config
  context srp
    bfd-protocol
    #exit
    service-redundancy-protocol
      monitor bfd context srp 1.1.1.1 chassis-to-chassis
      peer-ip-address 1.1.2.1
      bind address 1.1.3.1
    #exit
    interface srp
      ip address 1.1.2.1 255.255.255.0
      bfd interval 50 min_rx 50 multiplier 10
    #exit
    ip route static bfd srp 1.1.3.1
  #exit
end
 

VPP Monitor

When SRP VPP monitor is configured, the UPF chassis is SRP Active and if the VPP subsystem fails, then SRP initiates switchover to Standby UPF. Currently, VPP health monitoring is limited to heartbeat mechanism between NPUMgr task and VPP process.

To configure the VPP monitor, see Configuring VPP Monitor on Active UPF and Standby UPF.

Sx/N4 Association Checkpoint

Whenever an Active UPF initiates an Sx/N4 association to SMF, the Standby UPF checkpoints this data. This maintains the association information even after the UPF switchover.

The Sx/N4 heartbeat messages are sent and the Active UPF responds back even after back-to-back UPF switchovers.

Sx/N4 Monitor

It is critical to monitor the Sx/N4 interface between the UPF and SMF. The SRP monitoring is enabled on Sx/N4 interface and the existing Sx/N4 heartbeat mechanism is leveraged to detect the monitor failure. The Sx/N4 module on Active UPF, on detecting the failure, informs the SRP VPNMgr to trigger UPF switchover event so that the Standby UPF takes over.

Note

Sx/N4 monitoring is available only in the UPF.

It is important to ensure that the SMF Sx/N4 heartbeat timeout is higher than the UPF Sx/N4 heartbeat timeout plus UPF ICSR switchover time. This is to ensure that the SMF does not detect the Sx/N4 path failure during a UPF switchover because of the UPF Sx/N4 monitor failure.

The Standby UPF itself has no independent connectivity to the SMF. The Active UPF Sx/N4 context is replicated to the Standby UPF so that it is ready to takeover during SRP switchover. This implies that when the Active UPF has switched over to Standby because of Sx/N4 monitor failure, the new Standby has no way of knowing if the UPF to SMF link is working. To prevent a switchback of the new Standby to Active state again due to Sx/N4 monitor failure in new Active, use the disallow-switchover-on-peer-monitor-fail keyword in the monitor sx CLI command.

After a chassis becomes Standby due to Sx/N4 monitoring failure, the Sx/N4 failure status is not reset even if Sx/N4 up checkpoint is received from the new Active UPF. This is to prevent the new Active to cause an unplanned switchback again due to Sx/N4 monitor failure when the previous cause of switchover itself was Sx/N4 monitor failure. This prevents back-to-back switchovers when SMF is down. The Sx/N4 monitor failure status must be manually reset when the operator is convinced that the network connectivity is normal. To reset, use the new srp reset-sx-fail CLI command (see Resetting Sx/N4 Monitor Failure) in the Standby chassis.

To configure the Sx/N4 monitor, see Configuring Sx/N4 Monitoring on the Active UPF and Standby UPF.

BGP Monitor

Configure BGP peer monitor and peer group monitors for the next-hop routers from UPF (both Gi and Gn side). This is the existing ICSR configuration. BGP may run with BFD assist to detect fast BGP peer failure.

To configure BGP monitoring and flag BPG monitoring failure, see Configuring BGP Status Monitoring Between Each UP and Next-Hop Router.

UPF Session Checkpoints

The Active chassis sends a collection of UPF data as checkpoints to the peer Standby chassis in the following scenarios:

  • New call setup

  • For every state change in the call

  • Periodically for accounting buckets

On receiving these checkpoints, the Standby chassis acts on the data and updates the necessary information either at the call, node, or instance level.

VPN IP Pool Checkpoints

During Sx/N4 Association, the IP pool allocated to each of the UPF is sent by SMF to the respective UPF. The VPNMgr receives this message in the UPF and checkpoints the same information to the Standby UPF when the SRP is configured.

The IP pool information is also sent during the SRP VPNMgr restart and during the SRP link down and up scenarios.

Validation of the presence of IP pool information in the Standby is vital before switchover. If the IP pool information is not present, then route advertisement is not possible. Therefore, traffic does not reach the UPF.

External Audit and PFD Configuration Audit Interaction

External Audit management is done in Active UPF. The Session Manager gets a start and complete notification of the Configuration Audit. The Session Manager does not start the External Audit if Configuration Audit is in progress. If the Configuration Audit start-notification arrives when the External Audit is already underway, then the Session Manager raises a flag such that the External Audit restarts when it completes. Restarting the External Audit is necessary because it does not achieve its purpose if it occurs when Configuration Audit is already underway.