The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes the best practices intended to run a successful HyperFlex Cluster Upgrade process.
s
Cisco recommends knowledge of these topics:
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
The decision on the selected target version is based on the HyperFlex environment needs. Its purpose is to improve, fix and take advantage of the new software from the old one.
Read the HyperFlex release notes to identify information such as new features, newly supported hardware, interoperability between components, guidelines, limitations, security fixes, and resolved caveats.
To check release notes information click here.
Before running the Hyperflex Cluster Upgrade confirm all versions are compatible. Cisco recommends:
Check the Cisco HyperFlex upgrade guides that provide step-by-step instructions to be performed.
The guides provide information about different types of scenarios such as:
Perform Health Checks before the HyperFlex upgrade runs to correct potential failures and avoid unexpected behaviors during the upgrade.
There are 2 different methods by which those health checks can be performed.
This tool is a utility to perform proactive self-checks on HyperFlex systems to ensure their stability and resiliency.
Hypercheck guide information is found here.
This is the suggested method for pre-checks. It is periodically updated to include new troubleshooting features that easily detects potential misconfigurations.
It keeps up to date finding newly discovered caveats that represent inconveniences during the upgrade process. Intersight HealthCheck guide information can be found here.
Step1. Log in to Intersight and navigate to Infrastructure Service, then select HyperFlex Clusters, and choose the Cluster.
Examples show a cluster named San_Jose. In the Actions dropdown menu, select Run Health Check.
Note: This example shows health checks performed on a single cluster. You can select and perform health checks on multiple clusters at the same time.
Confirm your cluster and click Next.
The workflow allows you to skip some checks, if desired.
Step 2. Click Start to initiate the pre-check.
Check the progress bar and wait for the HealthCheck task to be completed.
Step 3. Once the HealthCheck task is completed, there are a couple of places where the results can be checked.
The Health Check tab displays the general results. The example is filtered to no-show Passed and Not Run results.
Step 4. Click Affected Nodes to verify the nodes in question.
From the Overview tab, check the Events: Alarms, Requests, and Advisories.
Expand each event for more details.
Example shows Requests expanded, click Run Selected Hypercheck Health Checks Failed.
It displays all successful and failed checks.
Step 5. Click the toggle for Show Additional Details.
Each Invoke Check can be expanded, providing a granular view of what has been checked.
It gives detailed information in JSON format for the Logs, Inputs, and Outputs.
Hpercheck video.
Intersight Health Check video.
Note: Some fixes require Technical Assistance Center (TAC ) intervention. Open a case if necessary.
UCS Manager firmware manager requires downloading the UCS Firmware Packages into the Fabric Interconnect boot flash partition. Check and delete old firmware packages that are no longer in use on the components to avoid filling the Fabric interconnects boot flash partition with unnecessary files.
Verify the Fabric Interconnect space.
Step 1. Navigate to Equipment, select Fabric Interconnects, and choose a Fabric Interconnect. The example shows Fabric interconnect A (Primary).
Step 2. On the general panel, select Local Storage Information and expand it.
If the upstream switch supports the command STP PortFast, it is highly suggested to enable it. Enabling the PortFast feature causes a switch, or a trunk port, to enter the STP forwarding state immediately, or upon a linkup event, thus bypassing the listening and upon learning states.
The PortFast feature is enabled at a port level, and this port can either be a physical or a logical port.
Verify on UCSM any faults related to port errors on uplinks or server’s ports to avoid failover undesired scenarios.
Step 1. Log in to UCSM and navigate to the Equipment tab, expand Rack-Mounts, and expand Servers. The example shows Server 1.
Step 2. Expand Adapters and then expand NICs.
Step 3. Verify each Network Interface Card (NIC) is clean.
Storage Data VAN needs to be configured on the upstream device to ensure failover is done in the event Fabric interconnect B is down.
Ensure you have all requirements listed on the HyperFlex installation guide.
Ensure network connectivity flows for both paths on the virtual-machine network interface cards (vmnics).
Note: To perform the Upstream Connectivity Test check this video.
Confirm the right NIC teaming is correctly configured based on UCS policies with this guide.
During an Infrastructure upgrade, wait for ESXi uplinks to come up before rebooting the other Fabric interconnect.
Beginning with Cisco HyperFlex Release 4.0(2a), the Upgrade page displays the last cluster upgrade eligibility test result and the last tested version of the UCS server, HX data platform, and/or ESXi.
To perform the upgrade eligibility test, log in to HX Connect:
Step 1. Select Upgrade > Test Upgrade Eligibility.
Step 2. Select the UCS Server Firmware check box to test the upgrade eligibility of UCS server firmware.
Step 3. Enter the Cisco UCS Manager Fully Qualified Domain Name (FQDN) or IP address, username, and password. In the Current Version field, click Discover to choose the UCS firmware package version that needs to be validated before the upgrade.
Step 4. Select the HX Data Platform check box to test the upgrade eligibility of the HyperFlex Data Platform.
Step 5. Enter the vCenter username and password. Upload the Cisco HyperFlex Data Platform Upgrade Bundle that needs to be validated before the upgrade.
Step 6. Select the ESXi check box to test the upgrade eligibility of ESXi.
Step 7. Enter the vCenter Administrator username and password. Upload the Cisco HyperFlex Custom Image Offline Bundle that needs to be validated before the upgrade
Step 8. Click Validate.
Step 9. The progress of the upgrade eligibility test is displayed.
Verify passwords for:
Ensure that Virtual machines running on the host can be migrated to another host during the Maintenance Mode operation. If a VM is not able to be migrated, it needs to be powered off. If a VM does not migrate automatically, but can migrate manually, check if there is any problem related to DRS.
Verify that DRS is enabled, and set to fully automated, if licensed for DRS. If DRS is Disabled, manual intervention is required to vMotion the VMs manually when prompted by the upgrade process.
Check the VMware guide for more information.
Confirm vMotion is properly configured to avoid maintenance mode task that is not able to complete.
For further information on vMotion troubleshooting, review here.
Step 1. Log in to VMware vCenter, and navigate to Home and Clusters.
Step 2. Click the vCenter Cluster. This example shows a cluster named San_Jose.
Step 3. Select Configure, under Configuration, click VMware EVC and select EDIT.
Step 4. Ensure to change the EVC Mode to Enabled for the respective processor used.
Verify if there is any affinity rule created on the Guest VM.
Step 1. Go to the cluster from VMware vCenter.
Step 2. Navigate to Home and Clusters. This example shows a cluster named San_Jose.
Step 3. Select Configure. Under Configuration, select VM/Host Rules, and verify any rule is created.
From HXDP 5.0(x) and later releases, EAM is no longer used on the ESXi hosts to manage the SCVMs Network and Datastore.
From HXDP 5.0(x) and prior versions, Network and Datastore need to have SCVM information.
To Verify ESXi Agent Manager (EAM) health is normal.
Step 1. Log in to VMware vCenter.
Step 2. Navigate to Home and Clusters, and navigate to each ESXi node.
Step 3. On the VMware vCenter cluster, navigate to Configure, and from Virtual Machines, select Agent VM Settings.
The example shows blank spaces since the example HyperFlex cluster is on 5.0(2c)
If EAM is used, confirm no certificate errors are shown on vCenter.
More EAM information can be found here
vCenter and ESXi licenses
If upgrading from 6.x to 7.0, ensure you have the new licenses before the upgrade.
After the upgrade, you only have 60 days in Evaluation mode.
Failed attempts to log in can cause the ESXi users to be locked.
To verify hxuser or root user status
Step 1. Open a SSH sesion as root in ESXi node.
Step 2. Run pam_tally2 --user hxuser (or root user).
Step 3. Ensure the hxuser or root has been locked.
[root@esxi1:~] pam_tally2 --user hxuser
Login Failures Latest failure From
hxuser 0
[root@esxi1:~] pam_tally2 --user root
Login Failures Latest failure From
root 0
[root@esxi1:~]
To unlock mentioned ESXi Users:
Step 1. Run pam_tally2 --user hxuser --reset (or root user).
Step 2. Ensure the Failures count decreases to 0.
[root@esxi1:~] pam_tally2 --user hxuser --reset
Login Failures Latest failure From
hxuser 0
[root@esxi1:~] pam_tally2 --user root --reset
Login Failures Latest failure From
root 0
Increasing the security on ESXi host would require you to enable Lockdown mode. This configuration prevents HyperFlex upgrades due to lockdown Mode that must be disabled for a HyperFlex cluster upgrade.
To disable the ESXi lockdown mode:
Step 1. Run SSH directly into the ESXi host as root.
Step 2. PressF2forInitial Setup.
Step 3. Enter the root credentials to open the DUCI setup.
Step 4. Go to theConfigure Lockdown Modesetting and change it to disabled.
To disable lockdown mode from vCenter,
Step 1. Browse to the host in the vSphere Web Client inventory.
Step 2. Click theManagetab and clickSettings. (with 6.7, Click theConfiguretab).
Step 3. Under System, selectSecurity Profile.
Step 4. In the Lockdown Mode panel, clickEdit.
Step 5. ClickLockdown Modeand select one of the lockdown mode options.
More information about Lockdown mode information can be found here
If replication is configured and enabled, it needs to be paused before the upgrade.
Pause replication using the run stcli dp schedule pause command, and enable it after upgrade using the stcli dp schedule resume command.
Drive failures cause the HyperFlex cluster upgrade fails. To check the HyperFlex Connect GUI for Blocklisted or Ignored disks:
Step 1. Open the HyperFlex connect GUI, go to https://<HyperFlex-virtual-ip-addres or fqdn>.
Step 2. Go to System Information and then select System Overview Tab.
Step 3. Check for any disk errors.
Disks issues need to be fixed by Cisco TAC.
Motherboard Replacement causes to replace also the previous Host UIDs with the new IDs, if some issues have been done during the replacement tasks the UIDs mismatch could cause the HyperFlex Upgrade to fail.
Note: Intersight HealtCheck advises on ID mismatch, it is highly recommended to have the HyperFlex Cluster connected to Intersight and run the HyperFlex Cluster HealtCheck.
For motherboard replacement, compare stNode UUID from ESXi CLI to ensure UUID information matches with UUID in the Hyperflex Cluster.
To collect the UID:
Step 1. Open a SSH session to the ESXi node as root.
Step 2. Run this command: hostsvc/hostsumm | grep -i uuid | grep -v inst.
Step 3. Collect the UUID information.
[root@esxi2:~] vim-cmd hostsvc/hostsumm | grep -i uuid | grep -v inst
uuid = "1f82077d-6702-214d-8814-e776ffc0f53c", <----- ESXi2 ID
[root@esxi2:~]
[root@esxi2:~]
To get the UUID information on the HyperFlex cluster node:
Step 1. Run SSH into the HyperFlex cluster IP address.
Step 2. Run command stcli cluster info | more.
Step 3. Collect the stNodes IDs.
hxshell:~$ stcli cluster info | more
stNodes:
----------------------------------------
id: c4a24480-e935-6942-93ee-987dc8e9b5d9
type: node
name: esxi1
----------------------------------------
id: 1f82077d-6702-214d-8814-e776ffc0f53c <----- ID for ESXi2
type: node
name: esxi2
----------------------------------------
id: 50a5dc5d-c419-9c48-8914-d91a98d43fe7
type: node
name: esxi3
----------------------------------------
Ensure the stcli cluster info IDs match the information shown on the ESXi nodes.
Verify vCenter information such as Datacenter, Cluster and Datastore names on the HyperFlex cluster matches with vCenter. Information mismatch causes HyperFlex cluster upgrade to fail.
To have the most recent information:
Step 1. Run SSH into the HyperFlex cluster IP as admin.
Step 2. Run stcli cluster info | grep -i vcenter.
Step 3. Collect the registered vCenter information in the cluster.
hxshell:~$ stcli cluster info | grep -i vcenter
vCenterClusterName: vcenter-cluster
vCenterDatacenter: hx-cluster-name
vCenterURL: https://vcenter-url
vCenterDatacenterId: datacenter-name
vCenterClusterId: domain-c5124
vCenterUrl: https://vcenter-url
vCenterVersion: 7.0.2 Build-18455184
Consider that names are case sensitive. If the name and vCenter information from the previous output does not match, a vCenter reregistration is needed.
To re-register the vCenter into the Hyperflex cluster, check the the vCenter registration video here
In order to re-register the Vcenter:
Step 1. Run SSH into the cluster IP address as the admin.
Step 2. Run the stcli cluster reregister command.
stcli cluster reregister [-h] --vcenter-datacenter NEWDATACENTER --vcenter-cluster NEWVCENTERCLUSTER --vcenter-url NEWVCENTERURLIP [--vcenter-sso-url NEWVCENTERSSOURL] --vcenter-user NEWVCENTERUSER
hxshell:~$ stcli cluster reregister --vcenter-datacenter MyData-Center --vcenter-cluster Cluster-Name --vcenter-url https://vcenter1-url --vcenter-user <vCenter user>
Reregister StorFS cluster with a new vCenter ...
Enter NEW vCenter Administrator password:
Cluster reregistration with new vCenter succeeded
hxshell:~$
Revision | Publish Date | Comments |
---|---|---|
2.0 |
30-Sep-2023 |
corrected typo |
1.0 |
10-Aug-2023 |
Initial Release |