Cisco engineering has identified an issue with Cisco HyperFlex Release 4.0(2c) release that may affect your use of this software. Please review the Software Advisory notice here to determine if the issues apply to your environment and the steps required to address the issue.
CDETS |
Area of Impact |
Date |
November 18, 2020 |
||
Updated: January 8, 2021 |
November 18, 2020
Dear Cisco Customer,
Cisco engineering has identified the following software issues with the release that you have selected that may affect your use of this software. Please review the Software Advisory notice here to determine if the issues apply to your environment. You may proceed to download this software if you have no concerns with the issue described.
For more comprehensive information about what is included in this software, refer to the Cisco software Release Notes, available from the Product Selector tool. From this page, select the product you are interested in. Release Notes are under "General Information" on the product page.
Affected Software and Replacement Solution for CSCvw39220 |
||
Software Type |
Software Affected |
Software Solution |
HyperFlex Data Platform (HXDP)
|
Version: HX 4.0(2c) (build 35590)
Affected Images: HX Data Platform Installer for VMware ESXi for HX 4.0(2c) (build 35590) HX Data Platform Installer for Microsoft Hyper-V for HX 4.0(2c) (build 35590) HX Data Platform Upgrade Bundle for HX 4.0(2c) (build 35590) |
Software Versions: HX 4.0(2c) live-patched by Cisco TAC HX 4.0(2d) (build 35607) or later
Replacement Images: HX Data Platform Installer for VMware ESXi for HX 4.0(2d) (build 35607) HX Data Platform Installer for Microsoft Hyper-V for HX 4.0(2d) (build 35607) HX Data Platform Upgrade Bundle for HX 4.0(2d) (build 35607)
|
Reason for Advisory:
An issue with an internal periodic scheduled task to cleanup old system files may result in the task not finishing the job and consuming CPU resources. A new task is scheduled daily and over time the increased CPU consumption by these tasks may result in increased access latency to Hyperflex datastores. The maintenance script exhibiting the issue is present on Hyperflex Controller VMs running 4.0(2c) and is triggered when the internal /var/stv filesystem is filled to 50% or more and the size of /var/support/ZKTxnLog directory in proportion to the partition it resides in reaches 30%.
Solution:
For HX 4.0(2c) clusters connected to Intersight with Device Connector 1.0.9-2471 or above, the Controller VMs have been live patched and a new version of the maintenance script has been put in place. Device Connector 1.0.9-2471 has been released to Intersight cloud on 11/19/2020 and is planned for the Intersight Connected Virtual Appliance and Private Virtual Appliance release in December 2020.
For HX 4.0(2c) clusters not connected to Intersight, download and run the latest Cisco HyperFlex Hypercheck health and pre-upgrade check tool (version 4.1), which will indicate if the issue has been triggered on the cluster. If the issue has been triggered, it is possible to live patch the cluster without going through an HX Data Platform upgrade. Contact Cisco TAC as soon as possible for assistance. Even if the issue has not been triggered, it is recommended that the live patch be applied. Contact Cisco TAC for assistance.
The issue is resolved in the HX 4.0(2d) release. Existing clusters upgrading to HX 4.0(2d) or new clusters installed on HX 4.0(2d) will not be impacted by this issue.
Affected Hardware Platforms:
All HyperFlex converged node platforms.
Symptom:
Multiple ZKTxnCleanup script showing up in ps output.
Conditions:
The issue impacts HyperFlex clusters running HX 4.0(2c) and is triggered when the internal /var/stv filesystem is filled to 50% or more and the size of /var/support/ZKTxnLog directory in proportion to the partition it resides in reaches 30%.
Though the issue is only triggered in some HX 4.0(2c) clusters, it is recommended that all HX 4.0(2c) clusters be live patched using one of the workarounds detailed in the next section.
The workaround options depend on your deployment:
1. If a cluster is connected to Intersight Cloud or an Intersight Connected Virtual Appliance:
a) Verify the Device Connector version of your HX cluster. If the Device Connector version is 1.0.9-2471 or greater, the live patch is applied automatically.
b) To verify the Device Connector version, login to HX Connect and navigate to “Settings” > “Device Connector”, and the Device Connector version number is displayed on the lower left of the modal.
c) Verify that the patch is working by running Hypercheck (version 4.1 or greater). If the “ZK-Cleanup-Script” check is passed, the cluster is successfully patched, and no further action is required.
It may take up to 24 hours for the new maintenance task to start after the patch is applied. If the “ZK-Cleanup-Script” check reports failed after 24 hours, contact Cisco TAC for assistance.
For HX 4.0(2c) clusters that have been live patched by Intersight, it is not necessary to upgrade to HX 4.0(2d) for this issue. If you plan to add or replace a converged node in the cluster, Intersight will automatically apply the patch to the new or replaced node.
2. If a cluster is connected to an Intersight Private Virtual Appliance:
a) Verify the version of Device Connector on your HyperFlex Controller VMs (see above). If the version is 1.0.9-2471 or greater, the live patch is applied automatically.
b) If the Device Connector version is less than 1.0.9-2471, upload the latest Intersight appliance bundle for the Intersight Private Virtual Appliance. Refer to Intersight Virtual Appliance documentation for details.
c) Verify that the patch is working by running Hypercheck (version 4.1 or greater). If the “ZK-Cleanup-Script” check is passed, the cluster is successfully patched, and no further action is required.
It may take up to 24 hours for the new maintenance task to start after the patch is applied. If the “ZK-Cleanup-Script” check reports failed after 24 hours, contact Cisco TAC for assistance.
For HX 4.0(2c) clusters that have been live patched by Intersight, it is not necessary to upgrade to HX 4.0(2d) for this issue. If you plan to add or replace a converged node in the cluster, Intersight will automatically apply the patch to the new or replaced node.
3. If a cluster has been running HX 4.0(2c) for more than one (1) day, download and run Cisco HyperFlex Hypercheck (version 4.1 or greater).
a) If the “ZK-Cleanup-Script” check is failed, the issue is triggered on the cluster. Contact Cisco TAC as soon as possible to apply a live patch. Even if the issue is not triggered on the cluster, it is still recommended that you contact TAC to apply the live patch.
b) Verify that the patch is working by running Hypercheck (version 4.1 or greater). If the “ZK-Cleanup-Script” check is passed, the cluster is successfully patched, and no further action is required.
It may take up to 24 hours for the new maintenance task to start after the patch is applied. If the “ZK-Cleanup-Script” check reports failed after 24 hours, contact Cisco TAC for assistance.
For HX 4.0(2c) clusters that have been live patched, it is not necessary to upgrade to HX 4.0(2d) for this issue. If you plan to add or replace a converged node in a patched HX 4.0(2c) cluster, there are additional steps required after the node expansion or replacement. Refer to the More Info section for details.
4. If a cluster is currently running an HX version below HX 4.0(2c) and you are planning to upgrade to the HX 4.0.2 release train, or if you are planning to deploy a new cluster on the HX 4.0.2 release train, use HX 4.0(2d) or a higher, recommended HX 4.0.2 patch release. Refer to the HX Data Platform Recommended Release bulletin for the latest information on recommended releases.
More Info:
For an HX 4.0(2c) cluster that is live patched by Intersight, when a new converged node is added to the cluster, whether as a node expansion or node replacement, Intersight will automatically apply the patch to the new or replaced node.
For an HX 4.0(2c) cluster that is live patched by Cisco TAC and is not connected to Intersight, when a new converged node is added to the cluster, whether as a node expansion or node replacement, the new node will also need to be patched. Contact Cisco TAC for assistance.
Alternatively, consider connecting to and claiming the cluster in Intersight, or upgrading to HX 4.0(2d), or a higher, recommended HX release, before a converged node expansion.
Updated: January 8, 2021
Dear Cisco Customer,
Cisco engineering has identified the following software issues with the release that you have selected that may affect your use of this software. Please review the Software Advisory notice here to determine if the issues apply to your environment. You may proceed to download this software if you have no concerns with the issue described.
For more comprehensive information about what is included in this software, refer to the Cisco software Release Notes, available from the Product Selector tool. From this page, select the product you are interested in. Release Notes are under "General Information" on the product page.
Reason for Advisory:
An interoperability issue between HyperFlex and vCenter 7.0 Update 1 (U1) has been identified that impacts HyperFlex controller VMs that are managed by vSphere ESX Agent Manager (EAM). At the time of initial publishing of this notice, vCenter and ESXi 7.0 and 7.0 U1 were not supported with HyperFlex, and this notice was issued out of an abundance of caution. HyperFlex Data Platform software versions 4.0(2) & 4.5(1) and later now support vCenter Server 7.0 U1c (build 17327517) and higher.
HyperFlex Data Platform software adheres to a list of compatible ESXi and vCenter versions in all published software release notes. vCenter Server 7.0 U1c may be used with HyperFlex 4.0(2) and 4.5(1) and later releases. No vCenter Server 7 version prior to U1c may be used at any time. Refer to the HyperFlex 4.0 Release Notes, table 6 & Cisco HX Release 4.5(x) – Software Requirements, table 7 for full details.
Note: ESXi & vCenter compatibility with the HyperFlex Data Platform are independently qualified and listed separately in the product documentation. This field notice strictly covers vCenter Server 7.0 interoperability and does not address ESXi 7.0 interoperability.
vCenter Server 7.0 behavior prior to U1c:
Some HyperFlex clusters use a service in vCenter Server that is called EAM. This service is responsible for the lifecycle of the HyperFlex controller VMs (named stCtlVM in vCenter).
When existing HyperFlex ESXi hosts are moved from an existing vCenter Server to a new vCenter Server 7.0 U1, the HyperFlex controller VMs that reside on these hosts will be powered off and deleted. This issue impacts HyperFlex clusters that were initially installed with a HyperFlex Data Platform (HXDP) version earlier than 4.0(1a). By default, all clusters that were deployed initially on a version pre-4.0(1a) will continue to use the EAM service, even after an upgrade to version 4.0(1a) of HXDP or later.
Example scenarios are listed here:
o HyperFlex cluster initially deployed on 2.6(1a) and then upgraded to 4.0(2c).
· Although the cluster is upgraded to a version later than 4.0(1a), it was initially deployed on a version pre-4.0(1a) and is therefore susceptible to this issue.
o HyperFlex cluster initially deployed on 4.0(1b).
· This cluster is not impacted because it was initially deployed on version 4.0(1a) or later.
o HyperFlex cluster initially deployed on 3.5(2a), then upgraded to 4.0(2c), and then manually modified in order to remove the EAM configuration in accordance with Cisco documentation.
· This cluster is still susceptible. The manual removal procedure does not resolve the problem.
If a susceptible cluster is removed from the current vCenter and the corresponding ESXi hosts are added to a new vCenter 7.0 U1 instance, the controller VMs will be permanently deleted. As a result, the cluster becomes unavailable and, in some cases, the cluster data becomes unrecoverable.
Affected Hardware Platforms:
All HyperFlex converged nodes except for new hardware that requires HXDP version 4.0(1a) or later.
Symptom:
HyperFlex Controller VMs can power off suddenly and be deleted from disk by the EAM service in vCenter when adding the ESXi hosts that host the controller VMs to a vCenter prior to 7.0 U1c. This results in a loss of cluster availability. In some cases, the HyperFlex storage cluster can become unrecoverable.
Workaround:
VMware has enhanced the default EAM behavior in vCenter Server 7.0 U1c and later to prevent orphaned VM cleanup automatically for non-vCLS VMs. Fresh and upgraded vCenter Server installations will no longer encounter an interoperability issue with HyperFlex Data Platform controller VMs when running vCenter Server 7.0 U1c and later.
When used with HyperFlex, vCenter Server should never be manually configured to auto-cleanup all orphaned VMs automatically. For further detail, refer to VMware KB 81352 available at https://kb.vmware.com/s/article/81352.
vCenter Server 7.0 prior to the U1c release should never be used with Cisco HyperFlex.
vCenter Server |
EAM Behavior |
HyperFlex Support Stance |
All 6.x releases |
No EAM Interop Issue. |
Supported per HyperFlex release notes |
7.0 (15952498) |
No EAM Interop Issue. Unqualified by Cisco. |
Unsupported |
7.0a (16189094) |
No EAM Interop Issue. Unqualified by Cisco. |
Unsupported |
7.0b (16386292) |
No EAM Interop Issue. Unqualified by Cisco. |
Unsupported |
7.0c (16620007) |
No EAM Interop Issue. Unqualified by Cisco. |
Unsupported |
7.0d (16749653) |
No EAM Interop Issue. Unqualified by Cisco. |
Unsupported |
7.0 U1 (16860138) |
EAM cleanup results in interoperability issue with HyperFlex |
Unsupported |
7.0 U1a (17004997) |
EAM cleanup results in interoperability issue with HyperFlex |
Unsupported |
7.0 U1c (17327517) and later |
Issue Resolved |
Supported per HyperFlex release notes |
Note: Manual removal of the EAM configuration in accordance with the documented procedure on Cisco.com will not prevent this issue.
More Information:
See Cisco Field Notice FN70620.
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
· Open a service request on Cisco.com
· By email or telephone