THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Revision | Publish Date | Comments |
---|---|---|
1.0
|
17-Apr-13
|
Initial Release
|
10.0
|
13-Oct-17
|
Migration to new field notice system
|
Affected Product ID | Comments |
---|---|
B250M2-BUN1
|
|
B250M2-BUN2
|
|
N20-B6625-2
|
|
N20-B6625-2-UPG
|
|
N20-B6625-2=
|
|
N20-B6625-2D
|
|
UCSB-DBUN-B250-104
|
|
UCSB-DBUN-B250-105
|
Defect ID | Headline |
---|---|
CSCvf34445 | Dummy defect for field notices that were created with out defects |
Cisco UCS B250 M2 blade servers experience intermittent uncorrectable ECC errors due to marginal voltage regulator settings.
During the investigation of a field failure on a B250 M2 blade, it was discovered that there was an oscillation on the 1.5V power rail that is used to power the DDR3 DIMMs. On the failing system, under heavy load, the amplitude of this oscillation increased to the point where the 1.5V power rail was out of spec and an ECC memory error occurred.
Root cause for this has been found to be marginal values programmed into the digital compensation loop inside the voltage regulator, allowing this oscillation to occur.
The fix is to reduce the gain of the compensation loop to increase the stability, and thus reduce the oscillations. Since this is a digital voltage regulator, this is done entirely in firmware, and no change to the circuit board or components is required.
The UCS C250 M2 blade shows Degraded status, and uncorrectable ECC errors are visible in the SEL Log. When caused by this issue, the intermittent uncorrectable ECC errors occur mostly when the system is under heavy load. The uncorrectable errors are often preceded by correctable errors but that is not always the case.
Example blade in degraded status:
Example SEL Log entries (note line f):
e | 11/23/2011 23:50:15 | CIMC | Memory DDR3_P1_A4_ECC #0x95 | | read 240 correctable ECC errors on Dimm 4 | Asserted
f | 11/23/2011 23:50:16 | BIOS | Memory #0x02 | Uncorrectable ECC/other uncorrectable memory error | RUN, Rank: 1, DIMM Socket: 5, Channel: A, Socket: 0, DIMM: A5 | Asserted
10 | 11/23/2011 23:50:17 | CIMC | Entity presence BIOS_POST_CMPLT #0x50 | Device Absent | Asserted
11 | 11/23/2011 23:50:17 | CIMC | Entity presence BIOS_POST_CMPLT #0x50 | Device Present | Deasserted
The issue is resolved in the following Cisco UCS B-Series software releases or higher:
- UCS B-Series software release 2.0(1w)
- UCS B-Series software release 1.4(3u) Note: 1.4(4d) is minimum recommended 1.4 release due to other severe issues in 1.4(3u). See release notes for more detail.
- UCS B-Series software release 1.3(1y)
The fix for the issue is in the Board Controller firmware 111026-111026 which is included with the release versions noted above. The board controller version will not be visible in UCSM unless the UCSM version is at or above one of the versions needed to resolve this issue.
UCS software can be downloaded from Cisco.com from the following location:
www.cisco.com > Support > (search for 'UCS') > Unified Computing System (UCS) Infrastructure Software Bundle
UCS software upgrade instructions can be found at the following location:
http://www.cisco.com/en/US/products/ps10281/prod_installation_guides_list.html
BIOS update instructions can be found at the following location:
http://www.cisco.com/en/US/products/ps10280/products_configuration_example09186a0080af4547.shtml
The board controller update is required for the voltage regulator to be reprogrammed. A CIMC update is required to be able to do the board controller update. The CIMC update should be completed successfully prior to attempting the board controller update. The CIMC update does not require a reset of the blade server. Update of the board controller firmware requires a reset of the blade server. It is recommended to do the board controller update as part of the host firmware update package.
The fix for this issue is in the Board Controller firmware of the blade. The fix is already in place if the Board Controller version is 111026-111026 or higher. This should be checked carefully because the Board Controller and CIMC Controller are updated separately.
The running Board Controller and CIMC Controller versions can be identified by logging into UCS Manager and drilling down to the status of the UCS B250 M2 blade server and clicking the "Installed Firmware" tab.
Equipment > Chassis > Servers > (server in question)
Example screen shot of server status and firmware level:
Units reworked with new firmware prior to ECO E107438 are marked with Deviation number D123126
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.
Unleash the Power of TAC's Virtual Assistance