Perform Hyperflex Upgrade with Best Practices

Available Languages

Download Options

ePub (1.1 MB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (872.1 KB)
View on Kindle device or Kindle app on multiple devices

Updated:September 30, 2023

Document ID:220722

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Intersight HealthCheck

Intersight HealthCheck Demo

Instructional Videos

Important Validations

Delete Unused Packages on UCSM

Verify Spanning Tree Port (STP) PortFast Enabled on Upstream Switches

Physical Port Errors on Fabric Interconnect Ports or Adapters

Storage Data VLAN Configured Upstream on Standard and Stretched Clusters

MTU and Failover

Perform Test Upgrade Eligibility

Validate Users and Passwords

Validate Entering Maintenance Mode

Verify vMotionConfiguration

Verify EVC (Enhanced VMotion Compatibility) is Enabled on the Cluster.

Verify Affinity Rules in the Virtual Machines (VM)

ESXi Agent Manager (EAM)

SSH Recommendation

HXUSER Lockout

Lockdown Mode or Stopped

Replication

Drive Failures

Previous Motherboard Replacements, Redeploys, and Node Removals

Mismatches in HX and vCenter

HyperFlex vCenter Reregister

Related Information

Introduction

This document describes the best practices intended to run a successful HyperFlex Cluster Upgrade process.

Prerequisites

Requirements

Cisco recommends knowledge of these topics:

Unified Computing System Manager (UCSM)
Cisco Integrated Management Controller (CIMC)
HyperFlex
Elastic Sky X integrated (ESXi)
vCenter
Intersight

Components Used

HyperFlex Connect 4.5(2e)
UCSM 4.2.(1f)
Intersight
vCenter 7.0 U3

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Pre-Upgrade Steps

Version Selection

The decision on the selected target version is based on the HyperFlex environment needs. Its purpose is to improve, fix and take advantage of the new software from the old one.

Read the HyperFlex release notes to identify information such as new features, newly supported hardware, interoperability between components, guidelines, limitations, security fixes, and resolved caveats.

To check release notes information click here.

Compatibility

Before running the Hyperflex Cluster Upgrade confirm all versions are compatible. Cisco recommends:

Verify HyperFlex, Unified Computing System (UCS), and ESXi/vCenter versions interoperability.
Check all hardware models that are supported on target HyperFlex target version.
Cisco HyperFlex Software Requirements and Recommendations can be found here.
All UCS and VMware versions listed there are suggested and highly tested by Cisco.
Cisco suggests some HyperFlex versions which are more reliable and highly tested.
The Suggested HyperFlex versions can be found with a golden star, check here.

Upgrade Guides

Check the Cisco HyperFlex upgrade guides that provide step-by-step instructions to be performed.

The guides provide information about different types of scenarios such as:

Combined upgrade. Involves upgrade all the HyperFlex Cluster Components
Individual upgrade. Involves upgrade one of the HyperFlex Cluster Components
Offline upgrade. Would it Requires the HyperFlex Cluster to be shutdown.
Upgrade Workflows depending on cluster type (Standard, Strech, Edge)
Upgrade guides can be found here.
If the cluster was deployed using Cisco Intersight, use Intersight. Check the guide here.

Considerations

Cisco recommends performing online upgrades during low workflow traffic hours or maintenance windows.
The amount of time for the upgrade to complete is the relativity of the cluster’s size.
Consider that offline upgrade requires all guest Virtual Machines (VMs) to be offline.
Cisco recommends monitoring HyperFlex release notes for newly available versions to avoid the current version used as End of Life (EOL).
Check the Cisco HyperFlex release notes guides here.

Pre-Upgrade Tools

Perform Health Checks before the HyperFlex upgrade runs to correct potential failures and avoid unexpected behaviors during the upgrade.

There are 2 different methods by which those health checks can be performed.

Hypercheck

This tool is a utility to perform proactive self-checks on HyperFlex systems to ensure their stability and resiliency.

Hypercheck guide information is found here.

Intersight HealthCheck

This is the suggested method for pre-checks. It is periodically updated to include new troubleshooting features that easily detects potential misconfigurations.

It keeps up to date finding newly discovered caveats that represent inconveniences during the upgrade process. Intersight HealthCheck guide information can be found here.

Intersight HealthCheck Demo

Step1. Log in to Intersight and navigate to Infrastructure Service, then select HyperFlex Clusters, and choose the Cluster.

Examples show a cluster named San_Jose. In the Actions dropdown menu, select Run Health Check.

Run Health Check with Cluster San_Jose

Note: This example shows health checks performed on a single cluster. You can select and perform health checks on multiple clusters at the same time.

Confirm your cluster and click Next.

Confirm Cluster

The workflow allows you to skip some checks, if desired.

Step 2. Click Start to initiate the pre-check.

Initiate the Pre-Check

Check the progress bar and wait for the HealthCheck task to be completed.

Check Progress Bar for Completion

Step 3. Once the HealthCheck task is completed, there are a couple of places where the results can be checked.

The Health Check tab displays the general results. The example is filtered to no-show Passed and Not Run results.

Health Check Displays General Results

Step 4. Click Affected Nodes to verify the nodes in question.

Verify Nodes in Question

From the Overview tab, check the Events: Alarms, Requests, and Advisories.

Expand each event for more details.

Example shows Requests expanded, click Run Selected Hypercheck Health Checks Failed.

Run Selected Hypercheck Health Checks Failed

It displays all successful and failed checks.

Step 5. Click the toggle for Show Additional Details.

Show Additional Details

Each Invoke Check can be expanded, providing a granular view of what has been checked.

It gives detailed information in JSON format for the Logs, Inputs, and Outputs.

Detailed Information Presented in JSON Format

Instructional Videos

Hpercheck video.

Intersight Health Check video.

Note: Some fixes require Technical Assistance Center (TAC ) intervention. Open a case if necessary.

Important Validations

Delete Unused Packages on UCSM

UCS Manager firmware manager requires downloading the UCS Firmware Packages into the Fabric Interconnect boot flash partition. Check and delete old firmware packages that are no longer in use on the components to avoid filling the Fabric interconnects boot flash partition with unnecessary files.

Verify the Fabric Interconnect space.

Step 1. Navigate to Equipment, select Fabric Interconnects, and choose a Fabric Interconnect. The example shows Fabric interconnect A (Primary).

Step 2. On the general panel, select Local Storage Information and expand it.

Select Local Storage Information

Verify Spanning Tree Port (STP) PortFast Enabled on Upstream Switches

If the upstream switch supports the command STP PortFast, it is highly suggested to enable it. Enabling the PortFast feature causes a switch, or a trunk port, to enter the STP forwarding state immediately, or upon a linkup event, thus bypassing the listening and upon learning states.

The PortFast feature is enabled at a port level, and this port can either be a physical or a logical port.

Physical Port Errors on Fabric Interconnect Ports or Adapters

Verify on UCSM any faults related to port errors on uplinks or server’s ports to avoid failover undesired scenarios.

Step 1. Log in to UCSM and navigate to the Equipment tab, expand Rack-Mounts, and expand Servers. The example shows Server 1.

Step 2. Expand Adapters and then expand NICs.

Step 3. Verify each Network Interface Card (NIC) is clean.

Verify each Network Interface Card (NIC) is Clean

Storage Data VLAN Configured Upstream on Standard and Stretched Clusters

Storage Data VAN needs to be configured on the upstream device to ensure failover is done in the event Fabric interconnect B is down.

Ensure you have all requirements listed on the HyperFlex installation guide.

MTU and Failover

Ensure network connectivity flows for both paths on the virtual-machine network interface cards (vmnics).

Note: To perform the Upstream Connectivity Test check this video.

Confirm the right NIC teaming is correctly configured based on UCS policies with this guide.

During an Infrastructure upgrade, wait for ESXi uplinks to come up before rebooting the other Fabric interconnect.

Perform Test Upgrade Eligibility

Beginning with Cisco HyperFlex Release 4.0(2a), the Upgrade page displays the last cluster upgrade eligibility test result and the last tested version of the UCS server, HX data platform, and/or ESXi.

To perform the upgrade eligibility test, log in to HX Connect:

Step 1. Select Upgrade > Test Upgrade Eligibility.

Step 2. Select the UCS Server Firmware check box to test the upgrade eligibility of UCS server firmware.

Step 3. Enter the Cisco UCS Manager Fully Qualified Domain Name (FQDN) or IP address, username, and password. In the Current Version field, click Discover to choose the UCS firmware package version that needs to be validated before the upgrade.

Step 4. Select the HX Data Platform check box to test the upgrade eligibility of the HyperFlex Data Platform.

Step 5. Enter the vCenter username and password. Upload the Cisco HyperFlex Data Platform Upgrade Bundle that needs to be validated before the upgrade.

Step 6. Select the ESXi check box to test the upgrade eligibility of ESXi.

Step 7. Enter the vCenter Administrator username and password. Upload the Cisco HyperFlex Custom Image Offline Bundle that needs to be validated before the upgrade

Step 8. Click Validate.

Step 9. The progress of the upgrade eligibility test is displayed.

Validate Users and Passwords

Verify passwords for:

vCenter administrator
ESXi root
Storage Controller Virtual Machine (SCVM) admin and root

Validate Entering Maintenance Mode

Ensure that Virtual machines running on the host can be migrated to another host during the Maintenance Mode operation. If a VM is not able to be migrated, it needs to be powered off. If a VM does not migrate automatically, but can migrate manually, check if there is any problem related to DRS.

Verify that DRS is enabled, and set to fully automated, if licensed for DRS. If DRS is Disabled, manual intervention is required to vMotion the VMs manually when prompted by the upgrade process.

Check the VMware guide for more information.

Verify vMotion Configuration

Confirm vMotion is properly configured to avoid maintenance mode task that is not able to complete.

For further information on vMotion troubleshooting, review here.

Verify EVC (Enhanced VMotion Compatibility) is Enabled on the Cluster.

Step 1. Log in to VMware vCenter, and navigate to Home and Clusters.

Step 2. Click the vCenter Cluster. This example shows a cluster named San_Jose.

Step 3. Select Configure, under Configuration, click VMware EVC and select EDIT.

Verify Enhanced VMotion Compatibility (EVC) is Enabled

Step 4. Ensure to change the EVC Mode to Enabled for the respective processor used.

Change EVC Mode

Verify Affinity Rules in the Virtual Machines (VM)

Verify if there is any affinity rule created on the Guest VM.

Step 1. Go to the cluster from VMware vCenter.

Step 2. Navigate to Home and Clusters. This example shows a cluster named San_Jose.

Step 3. Select Configure. Under Configuration, select VM/Host Rules, and verify any rule is created.

Verify Affinity Rules in the Virtual Machine (VM)

ESXi Agent Manager (EAM)

From HXDP 5.0(x) and later releases, EAM is no longer used on the ESXi hosts to manage the SCVMs Network and Datastore.

From HXDP 5.0(x) and prior versions, Network and Datastore need to have SCVM information.

To Verify ESXi Agent Manager (EAM) health is normal.

Step 1. Log in to VMware vCenter.

Step 2. Navigate to Home and Clusters, and navigate to each ESXi node.

Step 3. On the VMware vCenter cluster, navigate to Configure, and from Virtual Machines, select Agent VM Settings.

The example shows blank spaces since the example HyperFlex cluster is on 5.0(2c)

Configure Agent VM Settings

If EAM is used, confirm no certificate errors are shown on vCenter.

More EAM information can be found here

vCenter and ESXi licenses

If upgrading from 6.x to 7.0, ensure you have the new licenses before the upgrade.

After the upgrade, you only have 60 days in Evaluation mode.

Evaluation Mode for 60 Days

SSH Recommendation

HXUSER Lockout

Failed attempts to log in can cause the ESXi users to be locked.

To verify hxuser or root user status

Step 1. Open a SSH sesion as root in ESXi node.

Step 2. Run pam_tally2 --user hxuser (or root user).

Step 3. Ensure the hxuser or root has been locked.

[root@esxi1:~] pam_tally2 --user hxuser
Login Failures Latest failure From
hxuser 0
[root@esxi1:~] pam_tally2 --user root
Login Failures Latest failure From
root 0
[root@esxi1:~]

To unlock mentioned ESXi Users:

Step 1. Run pam_tally2 --user hxuser --reset (or root user).

Step 2. Ensure the Failures count decreases to 0.

[root@esxi1:~] pam_tally2 --user hxuser --reset
Login Failures Latest failure From
hxuser 0

[root@esxi1:~] pam_tally2 --user root --reset
Login Failures Latest failure From
root 0

Lockdown Mode or Stopped

Increasing the security on ESXi host would require you to enable Lockdown mode. This configuration prevents HyperFlex upgrades due to lockdown Mode that must be disabled for a HyperFlex cluster upgrade.

To disable the ESXi lockdown mode:

Step 1. Run SSH directly into the ESXi host as root.

Step 2. PressF2forInitial Setup.

Step 3. Enter the root credentials to open the DUCI setup.

Step 4. Go to theConfigure Lockdown Modesetting and change it to disabled.

To disable lockdown mode from vCenter,

Step 1. Browse to the host in the vSphere Web Client inventory.

Step 2. Click theManagetab and clickSettings. (with 6.7, Click theConfiguretab).

Step 3. Under System, selectSecurity Profile.

Step 4. In the Lockdown Mode panel, clickEdit.

Step 5. ClickLockdown Modeand select one of the lockdown mode options.

More information about Lockdown mode information can be found here

Replication

If replication is configured and enabled, it needs to be paused before the upgrade.

Pause replication using the run stcli dp schedule pause command, and enable it after upgrade using the stcli dp schedule resume command.

Drive Failures

Drive failures cause the HyperFlex cluster upgrade fails. To check the HyperFlex Connect GUI for Blocklisted or Ignored disks:

Step 1. Open the HyperFlex connect GUI, go to https://<HyperFlex-virtual-ip-addres or fqdn>.

Step 2. Go to System Information and then select System Overview Tab.

Step 3. Check for any disk errors.

Check for Disk Errors

Disks issues need to be fixed by Cisco TAC.

Previous Motherboard Replacements, Redeploys, and Node Removals

Motherboard Replacement causes to replace also the previous Host UIDs with the new IDs, if some issues have been done during the replacement tasks the UIDs mismatch could cause the HyperFlex Upgrade to fail.

Note: Intersight HealtCheck advises on ID mismatch, it is highly recommended to have the HyperFlex Cluster connected to Intersight and run the HyperFlex Cluster HealtCheck.

For motherboard replacement, compare stNode UUID from ESXi CLI to ensure UUID information matches with UUID in the Hyperflex Cluster.

To collect the UID:

Step 1. Open a SSH session to the ESXi node as root.

Step 2. Run this command: hostsvc/hostsumm | grep -i uuid | grep -v inst.

Step 3. Collect the UUID information.

[root@esxi2:~] vim-cmd hostsvc/hostsumm | grep -i uuid | grep -v inst
      uuid = "1f82077d-6702-214d-8814-e776ffc0f53c",  <----- ESXi2 ID
[root@esxi2:~]
[root@esxi2:~]

To get the UUID information on the HyperFlex cluster node:

Step 1. Run SSH into the HyperFlex cluster IP address.

Step 2. Run command stcli cluster info | more.

Step 3. Collect the stNodes IDs.

hxshell:~$ stcli cluster info | more
stNodes:
    ----------------------------------------
    id: c4a24480-e935-6942-93ee-987dc8e9b5d9
    type: node
    name: esxi1
    ----------------------------------------
    id: 1f82077d-6702-214d-8814-e776ffc0f53c   <----- ID for ESXi2
    type: node
    name: esxi2
    ----------------------------------------
    id: 50a5dc5d-c419-9c48-8914-d91a98d43fe7
    type: node
    name: esxi3
    ----------------------------------------

Ensure the stcli cluster info IDs match the information shown on the ESXi nodes.

Mismatches in HX and vCenter

Verify vCenter information such as Datacenter, Cluster and Datastore names on the HyperFlex cluster matches with vCenter. Information mismatch causes HyperFlex cluster upgrade to fail.

To have the most recent information:

Step 1. Run SSH into the HyperFlex cluster IP as admin.

Step 2. Run stcli cluster info | grep -i vcenter.

Step 3. Collect the registered vCenter information in the cluster.

hxshell:~$ stcli cluster info | grep -i vcenter
vCenterClusterName: vcenter-cluster
vCenterDatacenter: hx-cluster-name
vCenterURL: https://vcenter-url
vCenterDatacenterId: datacenter-name
vCenterClusterId: domain-c5124
vCenterUrl: https://vcenter-url
vCenterVersion: 7.0.2 Build-18455184

HyperFlex vCenter Reregister

Consider that names are case sensitive. If the name and vCenter information from the previous output does not match, a vCenter reregistration is needed.

To re-register the vCenter into the Hyperflex cluster, check the the vCenter registration video here

In order to re-register the Vcenter:

Step 1. Run SSH into the cluster IP address as the admin.

Step 2. Run the stcli cluster reregister command.

stcli cluster reregister [-h] --vcenter-datacenter NEWDATACENTER --vcenter-cluster NEWVCENTERCLUSTER --vcenter-url NEWVCENTERURLIP [--vcenter-sso-url NEWVCENTERSSOURL] --vcenter-user NEWVCENTERUSER 

hxshell:~$ stcli cluster reregister --vcenter-datacenter MyData-Center --vcenter-cluster Cluster-Name --vcenter-url https://vcenter1-url --vcenter-user <vCenter user>
Reregister StorFS cluster with a new vCenter ...
Enter NEW vCenter Administrator password:
Cluster reregistration with new vCenter succeeded
hxshell:~$

Related Information

Revision History

Revision	Publish Date	Comments
2.0	30-Sep-2023	corrected typo
1.0	10-Aug-2023	Initial Release

Contributed by Cisco Engineers

Eric Romero Castelan
Technical Consulting Engineer
Sergio Mora
Technical Consulting Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

HyperFlex HX Data Platform

Perform Hyperflex Upgrade with Best Practices

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Prerequisites

Requirements

Components Used

Pre-Upgrade Steps

Version Selection

Compatibility

Upgrade Guides

Considerations

Pre-Upgrade Tools

Hypercheck

Intersight HealthCheck

Intersight HealthCheck Demo

Instructional Videos

Important Validations

Delete Unused Packages on UCSM

Verify Spanning Tree Port (STP) PortFast Enabled on Upstream Switches

Physical Port Errors on Fabric Interconnect Ports or Adapters

Storage Data VLAN Configured Upstream on Standard and Stretched Clusters

MTU and Failover

Perform Test Upgrade Eligibility

Validate Users and Passwords

Validate Entering Maintenance Mode

Verify vMotion Configuration

Verify EVC (Enhanced VMotion Compatibility) is Enabled on the Cluster.

Verify Affinity Rules in the Virtual Machines (VM)

ESXi Agent Manager (EAM)

SSH Recommendation

HXUSER Lockout

Lockdown Mode or Stopped

Replication

Drive Failures

Previous Motherboard Replacements, Redeploys, and Node Removals

Mismatches in HX and vCenter

HyperFlex vCenter Reregister

Related Information

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products