NF Management Failure Handling
Management NRF endpoint configuration sample is shown below.
product smf# show running-config group nf-mgmt
group nf-mgmt MGM
nrf-mgmt-group mgmt_group
locality LOC1
exit
product smf# show running-config group nrf mgmt
group nrf mgmt mgmt_group
service type nrf nnrf-nfm
endpoint-profile epprof
uri-scheme http
endpoint-name EP1
priority 2
primary ip-address ipv4 10.105.227.219
primary ip-address port 8082
secondary ip-address ipv4 10.105.227.220
secondary ip-address port 8082
exit
endpoint-name EP2
priority 10
primary ip-address ipv4 10.1.227.21
primary ip-address port 8082
secondary ip-address ipv4 10.1.227.22
secondary ip-address port 8082
exit
exit
exit
exit
product smf#
In the sample configuration, EP1 is the higher priority endpoint name as its priority is lesser than EP2 (2 against 10). So on bringing up, SMF sends NF registration to primary ip:port of EP1 [10.105.227.219:8082]. SMF uses secondary ip:port of EP1 if primary is down. SMF failovers to EP2 only if all ip:port of EP1 is down.
On successful registration with EP1 primary, SMF starts heartbeat with EP1 primary. If EP1 primary goes down, SMF detects the same by missing heartbeat response. On detecting EP1 primary down, SMF sends heartbeat to EP1 secondary [no reregistration]. Also, it periodically sends NF Heartbeat to EP1 primary to detect if it has recovered.
If SMF detects that EP1 primary and secondary is down, SMF failovers to EP2. When SMF failovers to EP2 primary, it sends reregistration (default behavior). It is assumed that all the endpoints with an endpoint name shares the database and so reregistration is only supported when failover is across endpoint names. In this case, EP1 primary and secondary shares the database. EP2 has a separate database and EP2 primary and secondary shares the database. On failover to EP2 primary, periodic NF registration is sent to primary of the EP1 only (to detect recovery).
Whenever a higher priority endpoint name is detected to be recovered, SMF falls back to the recovered IP:Port. For example, here the current active NRF endpoint is EP2 primary and SMF detects that EP1 primary has recovered, then SMF does reregistration with EP1 primary (default behavior) and stops heartbeat on EP2 primary.
Within endpoint NF heartbeat is used to track operational status. Across endpoints, registration is used to track the operational status. Message send timeout/RPC error and HTTP response codes 408, 429, 500, 501, 502, 503 are considered as failure to move to next NRF.