THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
02-Feb-21 |
Initial Release |
1.1 |
16-Feb-21 |
Updated the Problem Symptom, Workaround/Solution, and How to Identify Affected Products Sections |
2.0 |
22-Apr-21 |
Updated the Products Affected, Defect Information, Problem Description, Problem Symptom, Workaround/Solution, and How to Identify Affected Products Sections |
2.1 |
01-Jun-21 |
Add information on how to use serial numbers to identify units that may potentially be impacted (still requires secondary validation) |
Affected Product ID | Comments |
---|---|
UCS-FI-M-6324= |
|
UCS-FI-M-6324 |
|
CIT3-FI-M-6324 |
|
UCS-FI-6332 |
|
UCS-FI-6332= |
Part Alternate |
UCS-FI-6332-U |
|
HX-FI-6332 |
|
UCS-MAFI-6332 |
|
HX-UC-FI6332 |
|
HX-D-FI6332 |
|
UCS-R2F-FI-6332 |
|
UCS-FI-6332-16UP= |
Part Alternate |
UCS-FI-6332-16UP-U |
|
UCS-FI-6332-16UP |
|
HX-FI-6332-16UP |
|
UCS-NL-FI6332-16UP |
|
HX-NL-FI6332-16UP |
Defect ID | Headline |
---|---|
CSCvw51222 | UCS-FI-M-6324 - 500IT SSD hangs after 3.2 years power on hours |
CSCvw93034 | UCS-FI-6332 - 500IT SSD hangs after 3.2 years power on hours |
Because of a flaw in the Solid State Drive (SSD) firmware, the SSD will no longer respond after approximately 3.2 years of operation. A power-cycle of the system will allow the drive to operate for another six weeks before it again ceases to respond.
After 28,224 hours (~3.2 years) of accumulated Power On Hours (POHs), a memory buffer overrun condition occurs that triggers the firmware event. This causes the drive to become unresponsive until the drive is power-cycled. No data loss will occur when the memory buffer overrun firmware event occurs. A power-cycle restores normal operation of the drive. The drive continues to operate normally for 1008 additional accumulated POHs (6 weeks), at which time the drive will become unresponsive again. Another power-cycle of the drive will re-initiate the 1008 hour window.
UCS-FI-M-6324, UCS-FI-6332, and UCS-FI-6332-16UP with certain SSD models installed will reboot or management processes will crash after 28,224 POHs (~3.2 years).
Depending on the management process that crashes, the system might reboot or Unified Computing System (UCS) Manager might become inaccessible. The SSD's firmware must be reset in order to make it operational again, but from there on the same condition will occur every six weeks of additional POHs.
In order to reset the SSD firmware, pull and reinsert the power cables in order to manually power-cycle the Fabric Interconnect (FI).
Workaround
Manually power-cycle the system in order to temporarily recover from this problem. However, this failure will reappear after 1008 hours (six weeks) of operation. In order to reset the SSD firmware, pull and reinsert the power cables in order to manually power-cycle the FI. A simple reboot of the FI will not cause the SSD firmware to reset.
Solution
In order to prevent this issue and corresponding disruption to the network and operations, Cisco recommends to upgrade the SSD firmware proactively before the uptime reaches 28,224 POHs. Refer to the How to Identify Affected Products section and follow the firmware upgrade procedure accordingly.
If the system has already been impacted, the SSD firmware upgrade will permanently resolve this defect and prevent future recurrence.
There are two options to upgrade the firmware:
UCS-FI-M-6324
UCS-FI-6332 and UCS-FI-6332-16UP
See Cisco UCS Manager Firmware Management Guide, Release 4.1 for instructions on how to upgrade your firmware.
Via Intersight
Fabric interconnects that have been claimed in Intersight with Advantage Licensing will benefit from the ability to view affected devices directly within Intersight. Users can click the advisories button in the top right corner, select ‘View All’ at the bottom of the list, navigate to Field Notices and view any affected devices. An example of the Field Notice in Intersight is below:
UCS-FI-M6324
The SSD installed in the FI and its firmware version can be determined from the Cisco NX-OS CLI. If the SSD model is Micron_M500IT* and the firmware version is NOT CZ03.00 or later, then a firmware upgrade is required.
SSH to the FI VIP and enter these commands:
connect nxos a show system internal file /proc/scsi/scsi exit connect nxos b show system internal file /proc/scsi/scsi
MINI-FI-B(nxos)# show system internal file /proc/scsi/scsi Attached devices: Host: scsi4 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: Micron_M500IT_MT Rev: CZ01 Type: Direct-Access ANSI SCSI revision: 05
Note: On the first boot after an upgrade, the show system internal file /proc/scsi/scsi
command still shows the old firmware. This file is only updated during the boot process. A second reboot is required after the upgrade process to use this method.
UCS-FI-6332 and UCS-FI-6332-16UP
The SSD installed in the FI and its firmware version can be determined from the Cisco NX-OS CLI. If the SSD model is Micron_M500IT* and the firmware version is NOT MC03 or MU05 or later, then a firmware upgrade is required.
SSH to the FI VIP and enter these commands:
connect nxos a show system internal file /proc/scsi/scsi exit connect nxos b show system internal file /proc/scsi/scsi
3GFI-FI-B(nxos)# show system internal file /proc/scsi/scsi Attached devices: Host: scsi4 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: Micron_M500IT_MT Rev: MC02 Type: Direct-Access ANSI SCSI revision: 05
Note: On the first boot after an upgrade, the show system internal file /proc/scsi/scsi
command will still show the old firmware. This file is only updated during the boot process. A second reboot is required after the upgrade process in order to use this method.
Current POHs Check
If you currently run firmware that is later than Version 3.1(3a) or 3.2(1d), then the current POH count can be reviewed from a technical support file. See Visual Guide to Collect UCS Tech Support Files - B, C and S Series for more information.
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 51074
Local Time is: Fri Apr 9 01:57:40 2021 EDT
SSD Update Verification
UCS-FI-M-6324 - If the SSD model is Micron_M500IT* and the firmware version is CZ03 or later, the firmware has been successfully upgraded.
UCS-FI-6332 and UCS-FI-6332-16UP - If the SSD model is Micron_M500IT* and the firmware version is MC03 or MU05 or later, the firmware has been successfully upgraded.
smartctl -a /dev/sda
Note: On the first boot after an upgrade, the show system internal file /proc/scsi/scsi
command still shows the old firmware. This file is only updated during the boot process. A second reboot is required after the upgrade process in order to use this method.
This field notice provides the ability to determine if the serial number(s) of a device is impacted by this issue. In order to verify your serial number(s), enter it in the Serial Number Validation tool at https://snvui.cisco.com/snv/FN72028.
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
My Notifications—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.
Unleash the Power of TAC's Virtual Assistance