THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
20-May-21 |
Initial Release |
2.0 |
21-May-21 |
Updated the Workaround/Solution and How to Identify Affected Products Sections |
2.1 |
08-Jun-21 |
Updated the Title and Defect Information and Workaround/Solution Sections |
Affected Product ID | Comments |
---|---|
NCS-5011= |
Part Alternate |
NCS-5011 |
|
NCS-5011-32F |
|
NCS-5011-32H |
Defect ID | Headline |
---|---|
CSCvx67966 | Onboard SSD Upgrade: Adding a new feature to update onboard SSDs to a newer version in NCS5K devices |
CSCvy05695 | Umbrella SMU for onboard SSD Upgrade on NCS5011 |
Because of a flaw in Solid State Drive (SSD) firmware, the SSD drive becomes unresponsive after approximately 3.2 years of accumulated operation. After the first unresponsive event, each time the system is power cycled it allows the drive to operate for another 1008 hours (approximately 6 weeks) before the drive becomes unresponsive again.
After approximately 3.2 years (28,224 accumulated Power On Hours (POH)), a memory buffer overrun condition occurs that triggers the firmware event in the SSD. This causes the drive to become unresponsive until the system is power cycled.
No data loss occurs when the memory buffer overrun firmware event occurs.
A power cycle restores normal operation of the drive. The drive continues to operate normally for approximately 6 weeks (1008 additional accumulated POH), at which time the drive again becomes unresponsive. A subsequent power cycle re-initiates the 1008-hour window.
After 3.2 years of operation, the router behaviour is unpredictable as the SSD locks up.
Below is one of the instances of an SSD in lock state -
RP/0/RP0/CPU0:ios#show logging | inc Read-only
Here is a sample output -
RP/0/RP0/CPU0:ios#show logging | inc Read-only
Here is a sample output when this condition occurs on affected products:
Day MMM X HH:MM:SS.048 UTC
start_backing_thread:bind: Read-only file system
start_backing_thread:bind: Read-only file system
ctrace_enable_configuration(): inotify_add_watch() failed fd 11 ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_info/ctrace.ctrl. No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
ctrace_enable_configuration(): inotify_add_watch() failed fd 12 ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_error/ctrace.ctrl. No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
ctrace_enable_configuration(): inotify_add_watch() failed fd 13 ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_conn/ctrace.ctrl. No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
ctrace_enable_configuration(): inotify_add_watch() failed fd 14 ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_msc/ctrace.ctrl. No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
start_backing_thread:bind: Read-only file system
RP/0/RP0/CPU0:Day MMM?X HH:MM:SS.418 UTC: syslog_dev[117]: syslog_infra_hm[141] PID-23708: ctrace abort handler: unable to open trace file /var/log/ctrace/_pkg_bin_logger/xr_ds_capi_msc/ctrace_0.trc (Read-only f ile system)))ctrace abort handler: unable to open trace file /var/log/ctrace/_pkg_bin_logger/xr_ds_capi_conn/ctrace_0.trc (Read-only file system))ctrace abort handler: unable to open trace file /var/log/ctrac e/_pkg_bin_logger/xr_ds_capi_error/ctrace_0.trc (Read-only file system)
RP/0/RP0/CPU0:Day MMM?X HH:MM:SS.418 UTC: syslog_dev[117]: syslog_infra_hm[141] PID-23708: ctrace abort handler: unable to open trace file /var/log/ctrace/_pkg_bin_logger/xr_ds_capi_info/ctrace_0.trc (Read-only file system))
Workaround: Power cycle the system in order to recover from this problem temporarily. However, the failure will reoccur after another 1008 hours of operation.
Solution: Install the correct Software Maintenance Update (SMU) in order to upgrade the SSD firmware.
In order to prevent the occurrance of this issue and avoid disruption to the network and operations, Cisco recommends that you upgrade the firmware of the SSD proactively before the uptime reaches 28,224 hours.
Please refer to the section “How to Identify the Affected Products." If your system is impacted, SMU installation followed by the SSD firmware upgrade will permanently resolve this defect.
Note: Because a firmware update resolves the issue, an RMA to return the product is not recommended.
Refer to this table in order to determine the correct SMU for your device.
IOS XR Release | SMU Link | SMU Install Impact |
---|---|---|
IOS XR Release 6.5.3 | ncs5k-sysadmin-6.5.3.CSCvy05695.tar | Reload |
IOS XR Release 6.6.3 | ncs5k-sysadmin-6.6.3.CSCvy05695.tar | Reload |
IOS XR Release 7.0.1 | ncs5k-sysadmin-7.0.1.CSCvy05695.tar | Reload |
IOS XR Release 7.1.1 | ncs5k-sysadmin-7.1.1.CSCvy05695.tar | Reload |
After installation of the SMU, use this command in order to check whether the firmware upgrade was successful:
admin show smart-monitor location all | inc "Location|Device Model|Firmware Version"
Expected output:
RP/0/RP0/CPU0:ios#admin show smart-monitor location all | inc "Location|Device" Day MMM DD HH:MM:SS.672 UTC Location: 0/RP0 Device Model: Micron_M500IT_MTFDDAT064SBD Firmware Version: MC03.00
Ensure that the firmware is updated to the fixed version, as shown in the table in the "How to Identify Affected Products" section.
For a more detailed FAQ on the SSD issue, refer to Solid State Drive Issue on Certain Products - Software Update Required to Avoid Failure and Customer Impact.
Cisco has identified the list of product Serial Numbers which are shipped with affected FW version. Refer to the "Serial Number Validation" section to determine if your product may potentially be affected. If the product is listed as “Affected” please check the FW version if it is already upgraded.
Use this command in order to check the firmware version:
admin show smart-monitor location all | inc "Location|Device Model|Firmware Version"
Expected output:
RP/0/RP0/CPU0:ios#admin show smart-monitor location all | inc "Location|Device" Day MMM DD HH:MM:SS.672 UTC Location: 0/RP0 Device Model: Micron_M500IT_MTFDDAT064SBD Firmware Version: MC02.00
Refer to this table in order to check impacted firmware versions.
Device Model | Impacted Firmware Version | Fixed Firmware Version |
---|---|---|
Micron_M500IT_MTFDDAT064SBD | MC02.00 / MU01.00 | MC03.00 / MU05.00 or higher |
Micron_M500IT_MTFDDAT064MBD | MU01.00 | MU05.00 or higher |
Note: The firmware version of Micron_M500IT_MTFDDAT064SBD can be either MC02.00 or MU01.00. It will upgrade to MC03.00 or MU05.00, respectively.
If the firmware is impacted, upgrade the firmware version with the recommended SMU before the system reaches 28,224 hours of operation. Use this command in order to check Power On Hours:
sysadmin-vm:0_RP0# show smart-monitor location 0/RP0 | inc Power_On_Hours
Sample output:
sysadmin-vm:0_RP0# show smart-monitor location 0/RP0 | inc Power_On_Hours DDD MMM DD HH:MM:SS.538 UTC+00:00 9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 17849
In the previous example, the SSD issue will occur after 10,375 hours of operation (28224 minus 17894).
This field notice provides the ability to determine if the serial number(s) of a device is impacted by this issue. In order to verify your serial number(s), enter it in the Serial Number Validation tool at https://snvui.cisco.com/snv/FN72161.
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
My Notifications—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.
Unleash the Power of TAC's Virtual Assistance