Software Upgrade on GR Pairs
Considering config commit
as reference. The same checklist is also applicable for other upgrade scenarios.
Checklist
Note | Don’t perform |
-
Don’t peform
config commits
on both sites at the same time. Performconfig commit
on each site separately. -
Prior to
config commit
on Rack-1/Site-1, initiate CLI based switchover on Rack-1/Site-1 and make sure that Rack-2/Site-2 is having Primary ownership for both the instances (instance-id 1 and instance-id 2). -
Perform
config commit
on Rack-1/Site-1. Wait tillconfig commit
is successful and PODs restart and are back in running state to fetch latest helm charts (if applicable). -
Revert the role of Rack-1/Site-1 to be Primary (Switch/Reset roles on both sites).
-
Verify that roles of Rack-1//Site-2 (Primary) and Rack-2//Site-2 (Standby) are as expected.
-
Repeat the above checklist for Rack-2/Site-2.
Software Upgrade
Rack-1/Site-1 Upgrade when GR is Enabled
-
Verify that roles of both instances on Rack-1//Site-1 are in PRIMARY/STANDBY.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "STANDBY"
-
Initiate switch role for both instances on Rack-1/Site-1 to STANDBY with failback-interval of 30 seconds. This step transitions the roles from PRIMARY/STANDBY to STANDBY_ERROR/STANDBY_ERROR.
NoteHeartbeat between both the sites should be successful.
geo switch-role instance-id 1 role standby failback-interval 30 geo switch-role instance-id 2 role standby failback-interval 30
-
Verify that roles of both instances has moved to STANDBY_ERROR on Rack-1/Site-1.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "STANDBY_ERROR"
-
Verify that roles of both instances has moved to PRIMARY on Rack-2/Site-2.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "PRIMARY"
-
Perform rolling upgrade (or) non-graceful upgrade using system mode shutdown/running as per the requirement on Rack-1/Site-1.
-
Perform the following steps post completion of upgrade procedure. Perform health check on Rack-1/Site-1 and ensure the PODs have come up and Rack-1/Site-1 is healthy.
-
Verify that roles of both instances remain in STANDBY_ERROR mode on Rack-1/Site-1.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "STANDBY_ERROR"
-
Initiate reset role for both instances on Rack-1/Site-1 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.
geo reset-role instance-id 1 role standby geo reset-role instance-id 2 role standby
-
Verify that the roles of both instances have moved to STANDBY on Rack-1/Site-1.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "STANDBY"
-
Initiate switch role for instance-id 1 on Rack-2/Site-2 to STANDBY. This step transitions roles of Rack-2/Site-2 from PRIMARY/PRIMARY to STANDBY_ERROR/PRIMARY and Rack-1/Site-1 from STANDBY/STANDBY to PRIMARY/STANDBY.
geo switch-role instance-id 1 role standby failback-interval 30
-
Verify that roles of the instances on Rack-2/Site-2 are in STANDBY_ERROR/PRIMARY.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "PRIMARY"
-
Verify that the roles of both instances on Rack-1/Site-1 are in PRIMARY/STANDBY.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "STANDBY"
-
Initiate reset role for instance-id 1 on Rack-2/Site-2 to STANDBY. This step transitions the roles of Rack-2/Site-2 from STANDBY_ERROR/PRIMARY to STANDBY/PRIMARY.
geo reset-role instance-id 1 role standby
-
Verify that roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "PRIMARY"
Rack-2/Site-2 Upgrade when GR is Enabled
-
Verify that roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "PRIMARY"
-
Initiate switch role for both instances on Rack-2/Site-2 to STANDBY with failback-interval of 30 seconds. This step transitions the roles from STANDBY/PRIMARY to STANDBY_ERROR/STANDBY_ERROR.
geo switch-role instance-id 1 role standby failback-interval 30 geo switch-role instance-id 2 role standby failback-interval 30
-
Verify that roles of both instances move to STANDBY_ERROR on Rack-2/Site-2.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "STANDBY_ERROR"
-
Verify that roles of both instances move to PRIMARY on Rack-1/Site-1.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "PRIMARY"
-
Perform rolling upgrade (or) non-graceful upgrade via system mode shutdown/running as per the requirement on Rack-2/Site-2.
-
Perform the subsequent steps post completion of upgrade procedure. Perform health check on Rack-2/Site-2 and ensure the PODs have come up and Rack-2/Site-2 is healthy
-
Verify that roles of both the instances remain in STANDBY_ERROR on Rack-2/Site-2.
show role instance-id 1 result "STANDBY_ERROR"
show role instance-id 2 result "STANDBY_ERROR"
-
Initiate reset role for both instances on Rack-2/Site-2 to STANDBY. This step transitions the roles from STANDBY_ERROR/STANDBY_ERROR to STANDBY/STANDBY.
geo reset-role instance-id 1 role standby geo reset-role instance-id 2 role standby
-
Verify that the roles of both instances move to STANDBY on Rack-2/Site-2.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "STANDBY"
-
Initiate switch role for instance-id 2 on Rack-1/Site-1 to STANDBY. This step transitions roles of Rack-1/Site-2 from PRIMARY/PRIMARY to PRIMARY/STANDBY_ERROR and Rack-2/Site-2 from STANDBY/STANDBY to STANDBY/PRIMARY.
geo switch-role instance-id 2 role standby failback-interval 30
-
Verify that roles of both instances on Rack-1/Site-1 are in PRIMARY/STANDBY_ERROR.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "STANDBY_ERROR"
-
Verify that roles of both instances on Rack-2/Site-2 are in STANDBY/PRIMARY.
show role instance-id 1 result "STANDBY"
show role instance-id 2 result "PRIMARY"
-
Initiate reset role for instance-id 2 on Rack-1/Site-1 to STANDBY. This step transitions the roles of Rack-1/Site-1 from PRIMARY/STANDBY_ERROR to PRIMARY/STANDBY.
geo reset-role instance-id 2 role standby
-
Verify that roles of both the instances on Rack-1/Site-1 are in PRIMARY/STANDBY.
show role instance-id 1 result "PRIMARY"
show role instance-id 2 result "STANDBY"