Software Advisory for CSCvo56350: SW Type: VMware ESXi 6.7 Hypervisor
Software Advisory for CSCvk62990: ESXi 6.0 Hosts Susceptible to PSOD
Updated: May 20, 2019
Dear Cisco Customer,
Cisco engineering has identified an issue with certain components in the VMware ESXi 6.7 hypervisor that may affect your use of this software. Please review the Software Advisory notice here to determine if the issues apply to your environment and the steps required to address the issue.
For more comprehensive information about what is included in this software, refer to the Cisco software Release Notes, available from the Product Selector tool. From this page, select the product you are interested in. Release Notes are under "General Information" on the product page.
Affected Software and Replacement Solution for CSCvo56350 |
||
Software Type |
Software Affected |
Software Solution |
VMware ESXi 6.7 Hypervisor |
Version: HX 4.0(1) version running the ESXi 6.7 hypervisor version listed below:
Affected Images (iso and zip bundles): ESXi 6.7U1 EP06 (build 11675023)
|
Version: VMware ESXi 6.7 hypervisor version listed below:
Replacement Images (iso and zip bundles): ESXi 6.7 U2 EP08 (build 13473784) Any future ESXi 6.7 images above build number 13473784
|
Reason for Advisory:
This software advisory addresses a bug found in the VMware ESXi 6.7 Hypervisor.
CSCvo56350 - PSOD - ESX_Only Upgrade from 6.5 to 6.7: Failed to mount boot tardisks
Related VMware PR 2324122 - ESXi hosts might fail with a purple diagnostic screen while booting with an error for boot tardisks mounting failure
Affected Hardware Platforms:
HX220C-M5SX
HX220C-M5 Edge
HX240C-M5SX
HX240C-M5L
HXAF220c-M5SN All NVMe
HXAF220C-M5SX
HXAF240C-M5SX
Symptom: A software bug exists in the VMware hypervisor that may cause ESXi to PSOD when running the listed ESXi build combined with HyperFlex Data Platform version 4.0. This issue has not been observed with earlier HXDP releases. The issue is acknowledged by VMware (VMware PR 2324122) and fixed in ESXi 6.7 U2 EP08 (build 13473784) and later.
Conditions: The following HyperFlex clusters are supsceptible:
· Clusters running HyperFlex Data Platform 4.0(1) and attempting to upgrade to 6.7U1 EP06 from earlier ESXi releases.
· Clusters upgrading to 4.0(1) from a previous HX release while already running ESX 6.7U1 EP06.
Workaround: If you are running HX version 4.0(1), do not attempt to upgrade from 6.0 or 6.5 to the susceptible 6.7 build listed above (EP06). VMware vSphere 6.7 U2 EP08 (build 13473784) has addressed the issue and is supported with HXDP 3.5(2), and 4.0(1). You can safely upgrade to VMware vSphere 6.7U2 EP08(build 13473784) once on a supported HXDP release.
Clusters already running ESXi 6.7U1 EP06 on 3.5(2) HXDP software should first upgrade ESXi to EP08 before upgrading HXDP to 4.0(1).
Customers not currently on vSphere 6.7 are advised to first upgrade to HyperFlex Data Platform version 4.0(1) and then upgrade to ESXi 6.7U2 EP08 (build 13473784), which remains unaffected by this bug.
More Info: When this issue is encountered, ESXi will PSOD upon reboot after upgrading to 6.7U1 EP06. Due to the failed boot, ESXi will then revert back to the alternate bootbank, and successfully reboot back into the original ESXi version before upgrade was initiated.
Updated: November 15, 2018
Dear Cisco Customer,
Cisco engineering has identified the following software issues with the release that you have selected that may affect your use of this software. Please review the Software Advisory notice here to determine if the issues apply to your environment. You may proceed to download this software if you have no concerns with the issue described.
For more comprehensive information about what is included in this software, refer to the Cisco software Release Notes, available from the Product Selector tool. From this page, select the product you are interested in. Release Notes are under "General Information" on the product page.
Affected Software and Replacement Solution for CSCvj66157 and CSCvm66552 |
||
Software Type |
Software Affected |
Software Solution |
Cisco HyperFlex Software |
Version: For CSCvj66157: 2.5(1b), 2.5(1c), 2.5(1d), 2.6(1a), 2.6(1b), 2.6(1d), 2.6(1e), 3.0(1a), 3.0(1b),
For CSCvm66552: 3.0(1c), 3.0(1d), 3.0(1e), 3.0(1h), 3.5(1a)
Affected Images:
Installer Packages: For CSCvj66157: Cisco-HX-Data-Platform-Installer-v2.5.1b-26284.ova Cisco-HX-Data-Platform-Installer-v2.5.1c-26345.ova Cisco-HX-Data-Platform-Installer-v2.5.1d-26363.ova cisco-HX-Data-Platform-Installer-v2.6.1a-26588.ova Cisco-HX-Data-Platform-Installer-v2.6.1b-26700.ova Cisco-HX-Data-Platform-Installer-v2.6.1d-26761.ova Cisco-HX-Data-Platform-Installer-v2.6.1e-26812.ova Cisco-HX-Data-Platform-Installer-v3.0.1a-29499-esx.ova Cisco-HX-Data-Platform-Installer-v3.0.1b-29665-esx.ova
For CSCvm66552: Cisco-HX-Data-Platform-Installer-v3.0.1c-29681-esx.ova Cisco-HX-Data-Platform-Installer-v3.0.1d-29754-esx.ova Cisco-HX-Data-Platform-Installer-v3.0.1e-29829-esx.ova Cisco-HX-Data-Platform-Installer-v3.0.1h-29834-esx.ova Cisco-HX-Data-Platform-Installer-v3.5.1a-31118-esx.ova |
Version:
3.0(1c) (for CSCvj66157 only) 3.0(1i) (for CSCvm66552)
Replacement Image:
Installer Package
Cisco-HX-Data-Platform-Installer-v3.0.1c-29681-esx.ova
Cisco-HX-Data-Platform-Installer-v3.0.1i-29885-esx.ova |
|
Upgrade Packages:
Cisco HyperFlex Data Platform Upgrade Bundle for upgrading existing clusters with previous release:
For CSCvj66157: storfs-packages-2.5.1d-26363.tgz storfs-packages-2.6.1b-26700.tgz storfs-packages-2.6.1d-26761.tgz storfs-packages-2.6.1e-26812.tgz storfs-packages-3.0.1b-29665.tgz
For CSCvm66552: storfs-packages-3.0.1c-29681.tgz storfs-packages-3.0.1d-29754.tgz storfs-packages-3.0.1e-29829.tgz storfs-packages-3.0.1h-29834.tgz storfs-packages-3.5.1a-31118.tgz |
Replacement Image:
Upgrade Package
storfs-packages-3.0.1c-29681.tgz
storfs-packages-v3.0.1i-29885.tgz |
Reason for Advisory:
This software advisory addresses two software issues.
I. CSCvj66157:
SED drive failure may cause the UCS/HX cluster to go down.
Affected Platforms:
HX Systems running the following drives: HX-SD38TBM1K9 / UCS-SD38TBM1K9 HX-SD38TBE1NK9 / UCS-SD38TBE1NK9 HX-SD960GBM1K9 / UCS-SD960GBM1K9
HX-SD960GBE1NK9 / UCS-SD960GBE1NK9
HX-M2-240GB / UCS-M2-240GB (This is a Boot SSD, does not have any user data)
Symptom:
HyperFlex will have frequent blacklisted drives and in some cases the HX cluster may go offline.
Due to a drive firmware bug on specific SSDs SMART attribute 187 (Reported UECC) or SMART attribute 5 (Reallocated_Sector_Ct) will show non-zero value. This means the data written to this particular block may have failed to read back successfully resulting in a disk being blacklisted.
Conditions:
Due to a drive firmware bug on specific SSDs SMART attribute 187 (Reported UECC) or SMART attribute 5 (Reallocated_Sector_Ct) will show non-zero value. This means the data written to this particular block may have failed to read back successfully resulting in a disk being blacklisted.
The issue is triggered when the drive is subjected to a low write, long idle time workload and it results in drive level uncorrectable errors. When HX sees repeated drive errors, it blacklists the drive, to proactively replace the drive. While the HXDP software protects against drive failures, there is a potential for cluster to go down when multiple drives simultaneously go down on physically separate nodes.
Resolution Path:
For systems with no SED drives:
· Combined upgrade of HXDP 3.0.1c and UCSM 3.2(3e) is required. The upgrade process has to be a combined upgrade and not just a UCSM upgrade.
For systems with SED drives:
To be done under Cisco TAC supervision:
· Upgrade to HXDP version 3.0(1c) or higher. [3.8TB SED SSD drives need 3.0(1i) as described below]
· Upgrade UCSM to version 3.2(3e) or higher.
Note: For systems with only HX-M2-240GB – this is the only required step. Subsequent steps are not required.
· Additional recovery processes may need to be initiated based on drive states in the cluster after the upgrade.
II. CSCvm66552:
SED drive failure may cause the UCS/HX cluster to go down.
Affected Platforms:
HX Systems running the following drives: HX-SD38TBM1K9 / UCS-SD38TBM1K9, HX-SD38TBE1NK9 / UCS-SD38TBE1NK9
Symptom:
HyperFlex will have frequent blacklisted drives and in some cases the HX cluster may go offline.
Due to a drive firmware bug on specific SSDs SMART attribute 187 (Reported UECC) or SMART attribute 5 (Reallocated_Sector_Ct) will show non-zero value. This means the data written to this particular block may have failed to read back successfully resulting in a disk being blacklisted.
Conditions:
Due to a drive firmware bug on specific SSDs SMART attribute 187 (Reported UECC) or SMART attribute 5 (Reallocated_Sector_Ct) will show non-zero value. This means the data written to this particular block may have failed to read back successfully resulting in a disk being blacklisted.
The issue is triggered when the drive is subjected to a specific workload of repeated reads of recently written data it results in drive level uncorrectable errors. When HX sees repeated drive errors, it blacklists the drive, to proactively replace the drive. While the HXDP software protects against drive failures, there is a potential for cluster to go down when multiple drives simultaneously go down on physically separate nodes.
Resolution Path:
For systems with no SED drives:
Upgrade to HXDP 3.0.1i is required. The upgrade need to be followed with a remediation steps that must be performed. All of these are outlined in Field Notice FN 70234 (https://www.cisco.com/c/en/us/support/docs/field-notices/702/fn70234.html).
Updated: October 16, 2018
Dear Cisco Customer,
Cisco engineering has identified an issue with certain components in the VMware ESXi 6.0 hypervisor that may affect your use of this software. Please review the Software Advisory notice here to determine if the issues apply to your environment and the steps required to address the issue.
For more comprehensive information about what is included in this software, refer to the Cisco software Release Notes, available from the Product Selector tool. From this page, select the product you are interested in. Release Notes are under "General Information" on the product page.
Affected Software and Replacement Solution for CSCvk62990 |
||
Software Type |
Software Affected |
Software Solution |
VMware ESXi 6.0 Hypervisor |
Version: HX Data Platform 2.6, 3.0, 3.5 running an ESXi 6.0 hypervisor version listed below or any other VMware 6.0 build:
Affected Images (iso and zip bundles): ESXi 6.0.U3patch1 (build 5572656) ESXi 6.0.U3patch6 (build 6921384) ESXi 6.0.U3e (build 7967664)
|
Version: Patched VMware ESXi 6.0 hypervisor listed below:
Replacement Images:
Cisco HX Custom Image for ESXi 6.0 EP19 Offline Bundle for Upgrading from prior ESXi versions (build 10719132)
Cisco HX Custom Image for ESXi 6.0 U3 EP18 Offline Bundle for Upgrading from prior ESXi versions (build 10474991)
Cisco HX Custom Image for ESXi 6.0 U3-9919195 Offline Bundle for Upgrading from prior ESXi versions (build 9919195)
|
Reason for Advisory:
This software advisory addresses a software driver issue found in the VMware ESXi 6.0 hypervisor.
CSCvk62990 - VMware ESXi 6.0 Hosts Susceptible to PSOD Triggered by xhci Driver Race Condition
Affected Hardware Platforms:
HX220C-M5SX
HX240C-M5SX
HXAF220C-M5SX
HXAF240C-M5SX
HX240C-M5L
Note: This software issue does not affect HyperFlex M4 generation hardware.
Symptom: A software bug exists in the VMware xhci-xhci vmklinux driver that may cause an ESXi host to crash (PSOD) due to a null pointer exception in the driver triggered by a race condition. This VMware bug has been observed sparingly during installation and upgrade activity on HyperFlex clusters running M5 hardware with the ESXi 6.0 hypervisor. All previously posted Cisco customized ESXi 6.0 builds are susceptible, including the latest patch builds available directly from VMware.
Conditions: To be susceptible, the HyperFlex cluster must be running an M5 hardware platform as listed above and must also be running ESXi 6.0. All HyperFlex M4 ESXi 6.0 & 6.5 clusters, M4 & M5 ESXi 6.5 clusters, and M5 Hyper-V 2016 clusters are unaffected.
Workaround: The issue is acknowledged by VMware and a fix has been tested, released, and posted on Cisco.com for ESXi 6.0. The fixed build can be found in the HyperFlex Downloads section on Cisco.com as ESXi 6.0 U3-9919195. Customers may opt to upgrade to this patch release or decide to upgrade to ESXi 6.5, that uses a different native driver framework that is not susceptible to this defect. Customers are encouraged to upgrade ESXi first before upgrading to a newer HyperFlex release. Upgrades should be performed using the esxcli software profile commands as documented in the upgrade guide. Other ESXi upgrade mechanisms, including using vSphere update manager (VUM) or ISO driven upgrades, should not be used on HyperFlex servers.
More Info: The vulnerable xhci-xhci driver version can be identified by running “esxcli software vib list” on any ESXi shell. Any version of xhci-xhci older than 1.0-3vmw.600.3.104.10280114 does not contain the fix. Please note the version numbers are similar and the underlined portion should be compared.
If the issue is encountered, it may be identified by a PF Exception 14 PSOD, followed by mention of xhci_endpoint in the stack trace. If this condition is met, a reboot of the host will bring it back into service without any additional user intervention required. VM workload disruption is possible, although Cisco has seen the issue only occur on fresh install prior to workload commission and during upgrade only when all VMs have been live migrated to other hosts as part of the normal upgrade process. Out of an abundance of caution, customers that must remain on ESXi 6.0 are encouraged to upgrade to the ESXi 6.0 U3-9919195 release prior to upgrading HyperFlex clusters.
This fix will become generally available in the next bugfix release from VMware. However, it is always recommended to download the latest Cisco customized ESXi zip bundles for upgrading HyperFlex clusters directly from Cisco.com.
Note: VMware build 9919195 is a tracking build number provided by VMware that contains additional fixes to misc-drivers and sata-ahci VIBs. The esx-base version in this build remains at 9313334. In all VMware management interfaces, UI, CLI, HX Connect, and Cisco Intersight, the build number will be displayed as 9313334 and not 9919195.