The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This chapter provides information about the faults that may be raised in and reported in CIMC Web UI.
This chapter includes the following sections:
•System Event Log-Related Faults
Fault Code: F0409
Message:
Thermal condition on chassis [id] cause: [thermalStateQualifier]
Explanation;
This fault occurs under the following condition:
•If a component within a chassis is operating outside the safe thermal operating range.
Recommended Action;
If you see this fault, take the following actions:
Step 1 Check the temperature readings and IOM and ensure it is within the recommended thermal safe operating range.
Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.
Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.
Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.
Fault Details:
Severity: major
Cause: thermal-problem
mibFaultCode: 409
mibFaultName: fltEquipmentChassisThermalThresholdCritical
moClass: equipment:Chassis
Type: environmental
Fault Code; F0410
Message:
Thermal condition on chassis [id] cause: [thermalStateQualifier]
Explanation:
This fault occurs under the following condition:
•If a component within a chassis is operating outside the safe thermal operating range.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Check the temperature readings for the IOM and ensure it is within the recommended thermal safe operating range.
Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.
Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.
Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.
Fault Details:
Severity: minor
Cause: thermal-problem
mibFaultCode: 410
mibFaultName: fltEquipmentChassisThermalThresholdNonCritical
moClass: equipment:Chassis
Type: environmental
Fault Code: F0411
Message:
Thermal condition on chassis [id] cause: [thermalStateQualifier]
Explanation:
This fault occurs under the following condition:
•If a component within a chassis is operating outside the safe thermal operating range.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Check the temperature readings for the IOM and ensure it is within the recommended thermal safe operating range.
Step 2 If the fault reports a "Thermal Sensor threshold crossing in IOM" error for one or both the IOMs, check if thermal faults have been raised against that IOM. Those faults include details of the thermal condition.
Step 3 If the fault reports a "Missing or Faulty Fan" error, check on the status of that fan. If it needs replacement, create a tech-support file for the chassis and contact Cisco TAC.
Step 4 If the above actions did not resolve the issue and the condition persists, create a tech-support file for the chassis and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: thermal-problem
mibFaultCode: 411
mibFaultName: fltEquipmentChassisThermalThresholdNonRecoverable
moClass: equipment:Chassis
Type: environmental
Fault Code: F0371
Message:
Fan [id] in Fan Module: [operability]Fan [id] in Fan Module [tray]-[id] under server [id] operability: [operability]
Explanation:
This fault occurs when one or more fans in a fan module are not operational, but at least one fan is operational.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the fan module.
Step 2 Review the Cisco UCS Site Preparation Guide and ensure the fan module has adequate airflow, including front and back clearance.
Step 3 Verify that the air flows are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 Replace the faulty fan modules.
Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: minor
Cause: equipment-degraded
mibFaultCode: 371
mibFaultName: fltEquipmentFanDegraded
moClass: equipment:Fan
Type: equipment
Fault Code: F0373
Message:
Fan [id] in Fan Module: [operability]Fan [id] in Fan Module [tray]-[id] under server [id] operability: [operability]
Explanation:
This fault occurs if a fan is not operational.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Remove fan module and re-install the fan module again. Remove only one fan module at a time.
Step 2 Replace fan module with a different fan module
Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: equipment-inoperable
mibFaultCode: 373
mibFaultName: fltEquipmentFanInoperable
moClass: equipment:Fan
Type: equipment
Fault Code: F0377
Message:
[presence]Fan module [tray]-[id] in server [id] presence:
Explanation:
This fault occurs if a fan Module slot is not equipped or removed from its slot.
Recommended Action:
If you see this fault, take the following actions:
Step 1 If the reported slot is empty, insert a fan module into the slot.
Step 2 If the reported slot contains a fan module, remove and reinsert the fan module.
Step 3 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: equipment-missing
mibFaultCode: 377
mibFaultName: fltEquipmentFanModuleMissing
moClass: equipment:FanModule
Type: equipment
Fault Code: F0395
Message:
[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]
Explanation:
This fault occurs when the fan speed reading from the fan controller does not match the desired fan speed and is outside of the normal operating range. This can indicate a problem with a fan or with the reading from the fan controller.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Monitor the fan status.
Step 2 If the problem persists for a long period of time or if other fans do not show the same problem, reseat the fan.
Step 3 Replace the fan module.
Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details
Severity: minor
Cause: performance-problem
mibFaultCode: 395
mibFaultName: fltEquipmentFanPerfThresholdNonCritical
moClass: equipment
Fault Code: F0396
Message:
[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]
Explanation:
This fault occurs when the fan speed read from the fan controller does not match the desired fan speed and has exceeded the critical threshold and is in risk of failure. This can indicate a problem with a fan or with the reading from the fan controller.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Monitor the fan status.
Step 2 If the problem persists for a long period of time or if other fans do not show the same problem, reseat the fan.
Step 3 If the above actions did not resolve the issue, create a tech-support file for the chassis and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: performance-problem
mibFaultCode: 396
mibFaultName: fltEquipmentFanPerfThresholdCritical
moClass: equipment:
Fault Code: F0397
Message:
[perf]Fan [id] in Fan Module [tray]-[id] under server [id] speed: [perf]
Explanation:
This fault occurs when the fan speed read from the fan controller has far exceeded the desired fan speed. It frequently indicates that the fan has failed.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Replace the fan.
Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: performance-problem
mibFaultCode: 397
mibFaultName: fltEquipmentFanPerfThresholdNonRecoverable
moClass: equipment:Fan
Type: equipment
Fault Code: F0434
Message:
[presence]Fan [id] in Fan Module [tray]-[id] under server [id] presence: [presence]
Explanation:
This fault occurs in the unlikely event that a fan in a fan module cannot be detected.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Insert/reinsert the fan module in the slot that is reporting the issue.
Step 2 Replace the fan module with a different fan module, if available.
Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: equipment-missing
mibFaultCode: 434
mibFaultName: fltEquipmentFanMissing
moClass: equipment:Fan
Type: equipment
Fault Code: F0376
Message:
[side] IOM [chassisId]/[id] is removed.
Explanation:
This fault typically occurs because an I/O module is removed from the chassis. For a standalone configuration, the chassis associated with the I/O module loses network connectivity. This is a critical fault because it can result in the loss of network connectivity and disrupt data traffic through the I/O module.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Re-seat/re-insert the I/O module.
Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: equipment-removed
mibFaultCode: 376
mibFaultName: fltEquipmentIOCardRemoved
moClass: equipment:IOCard
Type: equipment
Fault Code:F0379
Message:
[side] IOM [chassisId]/[id] operState: [operState]
Explanation:
This fault occurs when there is a thermal problem on an I/O module. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the I/O module.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the I/O modules have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows on the Cisco UCS chassis are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 Replace faulty I/O modules.
Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: thermal-problem
mibFaultCode: 379
mibFaultName: fltEquipmentIOCardThermalProblem
moClass: equipment:IOCard
Type: environmental
Fault Code: F0729
Message:
[side] IOM [chassisId]/[id] ([switchId]) temperature: [thermal]
Explanation:
This fault occurs when the temperature of an I/O module has exceeded a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the I/O module.
Step 2 Verify that the air flows on the Cisco UCS chassis and I/O module are not obstructed.
Step 3 Verify that the site cooling system is operating properly.
Step 4 Power off unused rack servers.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: minor
Cause: thermal-problem
mibFaultCode: 729
mibFaultName: fltEquipmentIOCardThermalThresholdNonCritical
moClass: equipment:IOCard
Type: environmental
Fault Code: F0730
Message:
[side] IOM [chassisId]/[id] ([switchId]) temperature: [thermal]
Explanation:
This fault occurs when the temperature of an I/O module has exceeded a critical threshold value. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the I/O module.
Step 2 Verify that the site cooling system is operating properly.
Step 3 Power off unused rack servers.
Step 4 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 5 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: thermal-problem
mibFaultCode: 730
mibFaultName: fltEquipmentIOCardThermalThresholdCritical
moClass: equipment:IOCard
Type: environmental
Fault Code: F0731
Message:
[side] IOM [chassisId]/[id] temperature: [thermal]
Explanation:
This fault occurs when the temperature of an I/O module has been out of the operating range, and the
issue is not recoverable. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the I/O module.
Step 2 Verify that the air flows on the Cisco UCS chassis and I/O module are not obstructed.
Step 3 Verify that the site cooling system is operating properly.
Step 4 Power off unused rack servers.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: thermal-problem
mibFaultCode: 731
mibFaultName: fltEquipmentIOCardThermalThresholdNonRecoverable
moClass: equipment:IOCard
Type: environmental
Fault Code: F0184
Message:
DIMM [location] on server [chassisId]/[slotId] operability: [operability]DIMM [location] on server [id]
operability: [operability]
Explanation:
This fault occurs when a DIMM is in a degraded operability state. This state typically occurs when an excessive number of correctable ECC errors are reported on the DIMM by the server BIOS.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Monitor the DIMM for further ECC errors. If the high number of errors persists, there is a high possibility of the DIMM becoming inoperable.
Step 2 If the DIMM becomes inoperable, replace the DIMM.
Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: equipment-degraded
mibFaultCode: 184
mibFaultName: fltMemoryUnitDegraded
moClass: memory:Unit
Type: equipment
Fault Code:F0185
Message:
DIMM [location] on server [chassisId]/[slotId] operability: [operability]DIMM [location] on server [id]
operability: [operability]
Explanation:
This fault typically occurs because an above threshold number of correctable or uncorrectable errors has occurred on a DIMM. The DIMM may be inoperable.
Recommended Action:
If you see this fault, take the following actions:
Step 1 If the SEL is enabled, review the SEL statistics on the DIMM to determine which threshold was crossed.
Step 2 If necessary, replace the DIMM.
Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: equipment-inoperable
mibFaultCode: 185
mibFaultName: fltMemoryUnitInoperable
moClass: memory:Unit
Fault Code:F0186
Message:
DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id]
temperature: [thermal]
Explanation:
This fault occurs when the temperature of a memory unit on a rack server exceeds a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. Inaddition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the server.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: minor
Cause: thermal-problem
mibFaultCode: 186
mibFaultName: fltMemoryUnitThermalThresholdNonCritical
moClass: memory:Unit
Type: environmental
Fault Code:F0187
Message:
DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id]
temperature: [thermal]
Explanation:
This fault occurs when the temperature of a memory unit on a rack server exceeds a critical threshold value. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the server.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: thermal-problem
mibFaultCode: 187
mibFaultName: fltMemoryUnitThermalThresholdCritical
moClass: memory:Unit
Type: environmental
Fault Code:F0188
Message:
DIMM [location] on server [chassisId]/[slotId] temperature: [thermal]DIMM [location] on server [id] temperature: [thermal]
Explanation:
This fault occurs when the temperature of a memory unit on a rack server has been out of the operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the server.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: thermal-problem
mibFaultCode: 188
mibFaultName: fltMemoryUnitThermalThresholdNonRecoverable
moClass: memory:Unit
Type: environmental
Fault Code:F0190
Message:
Memory array [id] on server [chassisId]/[slotId] voltage: [voltage]Memory array [id] on server [id] voltage: [voltage]
Explanation:
This fault occurs when the memory array voltage exceeds the specified hardware voltage rating.
Recommended Action:
If you see this fault, take the following actions:
Step 1 If the SEL is enabled, look at the SEL statistics on the DIMM to determine which threshold was crossed.
Step 2 Monitor the memory array for further degradation.
Step 3 Replace the power supply.
Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: voltage-problem
mibFaultCode: 190
mibFaultName: fltMemoryArrayVoltageThresholdCritical
moClass: memory:Array
Fault Code: F0191
Message:
Memory array [id] on server [chassisId]/[slotId] voltage: [voltage]Memory array [id] on server [id] voltage: [voltage]
Explanation:
This fault occurs when the memory array voltage exceeded the specified hardware voltage rating and potentially memory hardware may be in damage or jeopardy.
Recommended Action:
If you see this fault, take the following actions:
Step 1 If the SEL is enabled, review the SEL statistics on the DIMM to determine which threshold was crossed.
Step 2 Monitor the memory array for further degradation.
Step 3 Replace the power supply.
Step 4 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: voltage-problem
mibFaultCode: 191
mibFaultName: fltMemoryArrayVoltageThresholdNonRecoverable
moClass: memory:Array
Type: environmental
Fault Code: F0502
Message:
DIMM [location] on server [chassisId]/[slotId] has an invalid FRUDIMM [location] on server [id] has an invalid FRU
Explanation:
This fault typically occurs when a sensor has detected an unsupported DIMM in the server. For example, the model, vendor, or revision is not recognized
Recommended Action:
If you see this fault, take the following action:
Step 1 Verify if the DIMM is supported on the server configuration.
Step 2 If the above action did not resolve the issue, you may have unsupported DIMMs or DIMM configuration in the server. Contact Cisco TAC.
Fault Details:
Severity: warning
Cause: identity-unestablishable
mibFaultCode: 502
mibFaultName: fltMemoryUnitIdentityUnestablishable
moClass: memory:Unit
Type: equipment
Fault Code: F0174
Message
Processor [id] on server [chassisId]/[slotId] operability: [operability]
Explanation
This fault occurs in the event the processor encounters a catastrophic error or has exceeded pre-set thermal/power thresholds.
Recommended Action
If you see this fault, take the following action:
Step 1 In the event that the probable cause being indicated is a thermal problem, check to see if the air flow to the server is not obstructed, and it is adequately ventilated. If possible, check if the heat sink is properly seated on the processor.
Step 2 In the event that the probable cause being indicated is equipment inoperable, please contact Cisco TAC for further instructions.
Step 3 In the event that the probable cause being indicated is a power or voltage problem, it is recommended to see if the issue is resolved with an alternate power supply. If this fails to resolve the issue, please contact Cisco TAC.
Fault Details:
Severity: major
Cause: equipment-inoperable
mibFaultCode: 174
mibFaultName: fltProcessorUnitInoperable
moClass: processor:Unit
Type: equipment
Fault Code: F0175
Message:
Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]
Explanation:
This fault occurs when the processor temperature on a rack server exceeds a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.
Recommended Action:
If you see this fault, take the following action:
Step 1 Review the product specifications to determine the temperature operating range of the server.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: minor
Cause: thermal-problem
mibFaultCode: 175
mibFaultName: fltProcessorUnitThermalNonCritical
moClass: processor:Unit
Type: environmental
Fault Code: F0176
Message:
Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]
Explanation:
This fault occurs when the processor temperature on a rack server exceeds a critical threshold value. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the server.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: thermal-problem
mibFaultCode: 176
mibFaultName: fltProcessorUnitThermalThresholdCritical
moClass: processor:Unit
Type: environmental
Fault Code: F0177
Message:
Processor [id] on server [chassisId]/[slotId] temperature: [thermal]Processor [id] on server [id] temperature: [thermal]
Explanation:
This fault occurs when the processor temperature on a rack server has been out of the operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
•If sensors on a CPU reach 179.6F (82C), the system will take that CPU offline.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the server.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the servers have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows on the Cisco UCS chassis or rack server are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: non-recoverable
Cause: thermal-problem
mibFaultCode: 177
mibFaultName: fltProcessorUnitThermalThresholdNonRecoverable
moClass: processor:Unit
Type: environmental
Fault Code: F0842
Message:
Processor [id] on server [chassisId]/[slotId] operState: [operState]Processor [id] on server [id] operState: [operState]
Explanation:
This fault occurs in the unlikely event that a processor is disabled.
Recommended Action:
If you see this fault, take the following actions:
Step 1 If this fault occurs , remove and reinsert the server into the chassis.
Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: info
Cause: equipment-disabled
mibFaultCode: 842
mibFaultName: fltProcessorUnitDisabled
moClass: processor:Unit
Type: environmental
Fault Code: F0374
Message:
[operability]Power supply [id] in server [id] operability: [operability]
Explanation:
This fault typically occurs when the power supply unit is either offline or the input/output voltage is out of range.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Verify that the power cord is properly connected to the PSU and the power source.
Step 2 Verify that the power source is 220 volts.
Step 3 Remove the PSU and reinstall it.
Step 4 Replace the PSU.
Step 5 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: equipment-inoperable
mibFaultCode: 374
mibFaultName: fltEquipmentPsuInoperable
moClass: equipment:Psu
Type: equipment
Fault Code: F0381
Message:
[thermal]Power supply [id] in server [id] temperature: [thermal]
Explanation:
This fault occurs when the temperature of a PSU module has exceeded a non-critical threshold value, but is still below the critical threshold. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the PSU module.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 Replace faulty PSU modules.
Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: minor
Cause: thermal-problem
mibFaultCode: 381
mibFaultName: fltEquipmentPsuThermalThresholdNonCritical
moClass: equipment:Psu
Type: environmental
Fault Code: F0383
Message:
[thermal]Power supply [id] in server [id] temperature: [thermal]
Explanation:
This fault occurs when the temperature of a PSU module has exceeded a critical threshold value. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
Recommended Action:
If you see this fault, take the following action:
Step 1 Review the product specifications to determine the temperature operating range of the PSU module.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 Replace faulty PSU modules.
Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: thermal-problem
mibFaultCode: 383
mibFaultName: fltEquipmentPsuThermalThresholdCritical
moClass: equipment:Psu
Type: environmental
Fault Code: F0378
Message:
[presence]Power supply [id] in server [id] presence: [presence]
Explanation:
This fault typically occurs when the power supply module is either missing or the input power to the server is absent.
Recommended Action:
If you see this fault, take the following action:
Step 1 Check to see if the power supply is connected to a power source.
Step 2 If the PSU is physically present in the slot, remove and then reinsert it.
Step 3 If the PSU is not physically present in the slot, insert a new PSU.
Step 4 If you see this fault, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: equipment-missing
mibFaultCode: 378
mibFaultName: fltEquipmentPsuMissing
moClass: equipment:Psu
Type: equipment
Fault Code: F0385
Message:
[thermal]Power supply [id] in server [id] temperature: [thermal]
Explanation:
This fault occurs when the temperature of a PSU module has been out of operating range, and the issue is not recoverable. Be aware of the following possible contributing factors:
•Temperature extremes can cause Cisco UCS equipment to operate at reduced efficiency and cause a variety of problems, including early degradation, failure of chips, and failure of equipment. In addition, extreme temperature fluctuations can cause CPUs to become loose in their sockets.
•Cisco UCS equipment should operate in an environment that provides an inlet air temperature not colder than 50F (10C) nor hotter than 95F (35C).
Recommended Action:
If you see this fault, take the following actions:
Step 1 Review the product specifications to determine the temperature operating range of the PSU module.
Step 2 Review the Cisco UCS Site Preparation Guide to ensure the PSU modules have adequate airflow, including front and back clearance.
Step 3 Verify that the air flows are not obstructed.
Step 4 Verify that the site cooling system is operating properly.
Step 5 Clean the installation site at regular intervals to avoid buildup of dust and debris, which can cause a system to overheat.
Step 6 Replace faulty PSU modules.
Step 7 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: thermal-problem
mibFaultCode: 385
mibFaultName: fltEquipmentPsuThermalThresholdNonRecoverable
moClass: equipment:Psu
Type: environmental
Fault Code: F0389
Message:
[voltage]Power supply [id] in server [id] voltage: [voltage]
Explanation:
This fault occurs when the PSU voltage has exceeded the specified hardware voltage rating.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Remove and reseat the PSU.
Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: voltage-problem
mibFaultCode: 389
mibFaultName: fltEquipmentPsuVoltageThresholdCritical
moClass: equipment:Psu
Type: environmental
Fault Code:F0391
Message:
[voltage]Power supply [id] in server [id] voltage: [voltage]
Explanation:
This fault occurs when the PSU voltage has exceeded the specified hardware voltage rating and PSU hardware may have been damaged as a result or may be at risk of being damaged.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Remove and reseat the PSU.
Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: voltage-problem
mibFaultCode: 391
mibFaultName: fltEquipmentPsuVoltageThresholdNonRecoverable
moClass: equipment:Psu
Type: environmental
Fault Code: F0392
Message:
[perf]Power supply [id] in server [id] output power: [perf]
Explanation:
This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Monitor the PSU status.
Step 2 If possible, remove and reseat the PSU.
Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.
Fault Details:
Severity: minor
Cause: power-problem
mibFaultCode: 392
mibFaultName: fltEquipmentPsuPerfThresholdNonCritical
moClass: equipment:Psu
Type: equipment
Fault Code: F0393
Message
[perf]Power supply [id] in server [id] output power: [perf]
Explanation:
This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Monitor the PSU status.
Step 2 If possible, remove and reseat the PSU.
Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.
Fault Details:
Severity: warning
Cause: power-problem
mibFaultCode: 393
mibFaultName: fltEquipmentPsuPerfThresholdCritical
moClass: equipment:Psu
Type: equipment
Fault Code:F0394
Message:
[perf] Power supply [id] in server [id] output power: [perf]
Explanation:
This fault is raised as a warning if the current output of the PSU in a rack server does not match the desired output value.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Monitor the PSU status.
Step 2 If possible, remove and reseat the PSU.
Step 3 If the above action did not resolve the issue, create a tech-support file for the chassis, and contact Cisco TAC.
Fault Details:
Severity: major
Cause: power-problem
mibFaultCode: 394
mibFaultName: fltEquipmentPsuPerfThresholdNonRecoverable
moClass: equipment:Psu
Type: equipment
Fault Code: F0407
Message:
Power supply [id] on chassis [id] has a malformed FRUPower supply [id] on server [id] has a malformed FRU
Explanation:
This fault typically occurs when the FRU information for a power supply unit is corrupted or malformed.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Verify that the vendor specification for the power supply.
Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: fru-problem
mibFaultCode: 407
mibFaultName: fltEquipmentPsuIdentity
moClass: equipment:Psu
Type: equipment
Fault Code: F0743
Message
Chassis [id] was configured for redundancy, but running in a non-redundant configuration.
Explanation
This fault typically occurs when chassis power redundancy has failed.
Recommended Action
If you see this fault, take the following actions:
Step 1 Consider adding more PSUs to the chassis.
Step 2 Replace any non-functional PSUs.
Step 3 If the above actions did not resolve the issue, create a show tech-support file and contact Cisco TAC.
Fault Details
Severity: major
Cause: psu-redundancy-fail
mibFaultCode: 743
mibFaultName: fltPowerChassisMemberChassisPsuRedundanceFailure
moClass: power:ChassisMember
Type: environmental
Fault Code: F0882
Message:
Power supply [id] on chassis [id] has exceeded its power thresholdPower supply [id] on server [id] has exceeded its power threshold.
Explanation:
This fault occurs when a power supply unit is drawing too much current.
Recommended Action:
If you see this fault, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: power-problem
mibFaultCode: 882
mibFaultName: fltEquipmentPsuPowerThreshold
moClass: equipment:Psu
Type: equipment
Fault Code: F0883
Message:
Power supply [id] on chassis [id] has disconnected cable or bad input voltagePower supply [id] on server [id] has disconnected cable or bad input voltage.
Explanation:
This fault occurs when a power cable is disconnected or input voltage is incorrect.
Recommended Action:
If you see this fault, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: power-problem
mibFaultCode: 883
mibFaultName: fltEquipmentPsuInputError
moClass: equipment:Psu
Type: equipment
Fault Code: F0310
Message:
Motherboard of server [chassisId]/[slotId] (service profile: [assignedToDn]) power: [operPower]Motherboard of server [id] (service profile: [assignedToDn]) power: [operPower]
Explanation:
This fault typically occurs when the server power sensors have detected a problem.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Reseat/replace the power supply.
Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: power-problem
mibFaultCode: 310
mibFaultName: fltComputeBoardPowerError
moClass: compute:Board
Type: environmental
Fault Code: F0313
Message:
Server [id] (service profile: [assignedToDn]) BIOS failed power-on self testServer [chassisId]/[slotId] (service profile: [assignedToDn]) BIOS failed power-on self test.
Explanation:
This fault typically occurs when the server has encountered a diagnostic failure.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Connect to the CIMC WebUI and record from the KVM where the POST failure has occured.
Step 2 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: equipment-inoperable
mibFaultCode: 313
mibFaultName: fltComputePhysicalBiosPostTimeout
moClass: compute:Physical
Type: equipment
Fault Code: F0424
Message:
Possible loss of CMOS settings: CMOS battery voltage on server [chassisId]/[slotId] is [cmosVoltage]Possible loss of CMOS settings: CMOS battery voltage on server [id] is [cmosVoltage]
Explanation:
This fault is raised when the CMOS battery voltage has dropped to lower than the normal operating range. This could impact the clock and other CMOS settings.
Recommended Action:
If you see this fault, replace the battery.
Fault Details:
Severity: critical
Cause: voltage-problem
mibFaultCode: 424
mibFaultName: fltComputeBoardCmosVoltageThresholdCritical
moClass: compute:Board
Type: environmental
Fault Code: F0425
Message:
Possible loss of CMOS settings: CMOS battery voltage on server [chassisId]/[slotId] is [cmosVoltage]Possible loss of CMOS settings: CMOS battery voltage on server [id] is [cmosVoltage]
Explanation:
This fault is raised when the CMOS battery voltage has dropped quite low and is unlikely to recover. This impacts the clock and other CMOS settings.
Recommended Action:
If you see this fault, replace the battery.
Fault Details:
Severity: major
Cause: voltage-problem
mibFaultCode: 425
mibFaultName: fltComputeBoardCmosVoltageThresholdNonRecoverable
moClass: compute:Board
Type: environmental
IO Hub on server [chassisId]/[slotId] temperature: [thermal]
This fault is raised when the IO controller temperature is outside the upper or lower non-critical threshold.
If you see this fault, monitor other environmental events related to this server and ensure the temperature ranges are within recommended ranges.
Severity: minor
Cause: thermal-problem
mibFaultCode: 538
mibFaultName: fltComputeIOHubThermalNonCritical
moClass: compute:IOHub
Type: environmental
Fault Code: F0539
Message:
IO Hub on server [chassisId]/[slotId] temperature: [thermal]
Explanation:
This fault is raised when the IO controller temperature is outside the upper or lower critical threshold.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Monitor other environmental events related to the server and ensure the temperature ranges are within recommended ranges.
Step 2 Consider turning off the server for a while if possible.
Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: thermal-problem
mibFaultCode: 539
mibFaultName: fltComputeIOHubThermalThresholdCritical
moClass: compute:IOHub
Type: environmental
Fault Code: F0540
Message:
IO Hub on server [chassisId]/[slotId] temperature: [thermal]
Explanation:
This fault is raised when the IO controller temperature is outside the recoverable range of operation.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Shut down the server immediately.
Step 2 Create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: thermal-problem
mibFaultCode: 540
mibFaultName: fltComputeIOHubThermalThresholdNonRecoverable
moClass: compute:IOHub
Type: environmental
Fault Code: F0517
Message:
Server [id] POST or diagnostic failureServer [chassisId]/[slotId] POST or diagnostic failure.
Explanation:
This fault typically occurs when the server has encountered a diagnostic failure or an error during POST.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Check the POST result for the server.
Step 2 Reboot the server.
Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco Technical Support.
Fault Details:
Severity: major
Cause: equipment-problem
mibFaultCode: 517
mibFaultName: fltComputePhysicalPostFailure
moClass: compute:Physical
Type: server
Fault Code: F0868
Message:
[power]Motherboard of server [id] power: [power]
Explanation:
This fault typically occurs when the power sensors on a server detect a problem.
Recommended Action:
If you see this fault, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: critical
Cause: power-problem
mibFaultCode: 868
mibFaultName: fltComputeBoardPowerFail
moClass: compute:Board
Type: environmental
Fault Code: F0869
Message:
Motherboard of server [chassisId]/[slotId] : [assignedToDn]) thermal: [thermal]Motherboard of server [id] : [assignedToDn]) thermal: [thermal]
Explanation:
This fault typically occurs when the motherboard thermal sensors on a server detect a problem.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Verify that the server fans are working properly.
Step 2 Wait for 24 hours to see if the problem resolves itself.
Step 3 If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: thermal-problem
mibFaultCode: 869
mibFaultName: fltComputeBoardThermalProblem
moClass: compute:Board
Type: environmental
Fault Code: F0920
Message:
"sys/rack-unit-1/board"
Explanation:
This fault typically occurs when one or more motherboard input voltages has exceeded upper critical thresholds.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Reseat or replace the power supply.
Step 2 If the issue persists, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: voltage-problem
mibFaultCode: 920
mibFaultName: fltComputeBoardMotherBoardVoltageUpperThresholdCritical
moClass: compute:Board
Type: environmental
Fault Code: F1040
Message:
"sys/rack-unit-1/board"
Explanation:
This fault typically occurs when the motherboard power consumption exceeds certain threshold limits. When this happens, the power usage sensors on a server detect a problem.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Contact Cisco TAC.
Fault Details:
Severity: warning
Cause: power-problem
mibFaultCode: 1040
mibFaultName: fltComputeBoardPowerUsageProblem
moClass: compute:Board
Type: environmental
Fault Code: F0918
Message:
"sys/rack-unit-1/board"
Explanation:
This fault typically occurs when one or more motherboard input voltages has become too high and is unlikely to recover.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Contact Cisco TAC.
Fault Details:
Severity: critical
Cause: voltage-problem
mibFaultCode: 918
mibFaultName: fltComputeBoardMotherBoardVoltageThresholdUpperNonRecoverable
moClass: compute:Board
Type: environmental
Fault Code: F0919
Message:
"sys/rack-unit-1/board"
Explanation:
This fault typically occurs when one or more motherboard input voltages has dropped too low and is unlikely to recover.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Contact Cisco TAC.
Fault Details:
Severity: critical
Cause: voltage-problem
mibFaultCode: 919
mibFaultName: fltComputeBoardMotherBoardVoltageThresholdLowerNonRecoverable
moClass: compute: Board
Type: environmental
Fault Code: F0921
Message:
"sys/rack-unit-1/board"
Explanation:
This fault typically occurs when one or more motherboard input voltages has crossed lower critical thresholds.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Reseat or replace the power supply.
Step 2 If the issue persists, create a tech-support file and contact TAC.
Fault Details:
Severity: major
Cause: voltage-problem
mibFaultCode: 921
mibFaultName: fltComputeBoardMotherBoardVoltageLowerThresholdCritical
moClass: compute: Board
Type: environmental
Fault Code: F2500
Message:
"sys/rack-unit-1/board/memarray-%d/mem-%d"
Explanation:
This fault indicates that the memory DIMM has crossed a non critical threshold of reported ECC errors.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Continue to monitor the ECC errors reported by the memory DIMM. If it exceeds non recoverable thresholds, replace the memory DIMM.
Step 2 Monitor the server for temperature/voltage thresholds.
Fault Details:
Severity: minor
Cause: equipment-degraded
mibFaultCode: 2500
mibFaultName: fltMemoryUnitECCThresholdNonCritical
moClass: memory: Unit
Type: equipment
Fault Code: F2501
Message:
"sys/rack-unit-1/board/memarray-%d/mem-%d"
Explanation:
This fault indicates that the memory DIMM has crossed a critical threshold of reported ECC errors.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Continue to monitor the ECC errors reported by the memory DIMM. If it exceeds non recoverable thresholds, replace the memory DIMM.
Step 2 Monitor the server for temperature/voltage thresholds.
Fault Details:
Severity: warning
Cause: equipment-degraded
mibFaultCode: 2501
mibFaultName: fltMemoryUnitECCThresholdCritical
moClass: memory: Unit
Type: equipment
Fault Code: F2502
Message:
"sys/rack-unit-1/board/memarray-%d/mem-%d"
Explanation:
This fault indicates that the memory DIMM has crossed a non recoverable threshold of reported ECC errors.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Replace the memory DIMM.
Fault Details:
Severity: major
Cause: equipment-inoperable
mibFaultCode: 2502
mibFaultName: fltMemoryUnitECCThresholdNonRecoverable
moClass: memory: Unit
Type: equipment
Fault Code: F0181
Message:
Local disk [id] on server [chassisId]/[slotId] operability: [operability]Local disk [id] on server [id] operability: [operability]
Explanation:
This fault occurs when the local disk has become inoperable or has been removed while the server was in use.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Insert the disk in a supported slot.
Step 2 Remove and reinsert the local disk.
Step 3 Replace the disk, if an additional disk is available.
If the above actions did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: equipment-inoperable
mibFaultCode: 181
mibFaultName: fltStorageLocalDiskInoperable
moClass: storage:LocalDisk
Fault Code: F0531
Message:
RAID Battery on server [chassisId]/[slotId] operability: [operability]RAID Battery on server [id] operability: [operability]
Explanation:
This fault occurs when the RAID battery voltage is below the normal operating range.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Replace the RAID battery.
Step 2 If the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details:
Severity: major
Cause: equipment-inoperable
mibFaultCode: 531
mibFaultName: fltStorageRaidBatteryInoperable
moClass: storage:RaidBattery
Type: equipment
Fault Code: F0978
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d/pd-%d"
Explanation:
This fault indicates a physical disk copyback failure. This fault could indicate a physical drive problem or an issue with the RAID configuration.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Replace the physical drive and check to see if the issue is resolved after a rebuild.
Step 2 Reseat or replace the storage controller.
Step 3 Check configuration options for the storage controller in the MegaRAID ROM configuration page.
Fault Details:
Severity: warning
Cause: equipment-offline
mibFaultCode: 978
mibFaultName: fltStorageLocalDiskCopybackFailed
moClass: storage:LocalDisk
Type: equipment
Fault Code: F0969
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"
Explanation:
This fault indicates a controller battery backup unit failure.
Recommended Action:
If you see this fault, take the following action:
Step 1 Reseat or replace the battery backup unit on the storage controller.
Fault Details:
Severity: warning
Cause: equipment-degraded
mibFaultCode: 969
mibFaultName: fltStorageRaidBatteryDegraded
moClass: storage:RaidBattery
Type: equipment
Fault Code: F0970
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"
Explanation:
This fault indicates that a controller battery relearn was aborted.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Restart the relearn process for the battery backup unit.
Step 2 Reseat or replace the battery backup unit.
Step 3 Replace the battery backup unit if it has exceeded 100 relearn cycles.
Fault Details:
Severity: info
Cause: equipment-degraded
mibFaultCode: 970
mibFaultName: fltStorageRaidBatteryRelearnAborted
moClass: storage:RaidBattery
Type: equipment
Fault Code: F0971
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d/raid-battery-%d"
Explanation:
This fault indicates a controller battery relearn failure.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Restart the relearn process for the battery backup unit.
Step 2 Reseat or replace the battery backup unit.
Step 3 Replace the battery backup unit if it has exceeded 100 relearn cycles.
Fault Details:
Severity: warning
Cause: equipment-degraded
mibFaultCode: 971
mibFaultName: fltStorageRaidBatteryRelearnFailed
moClass: storage:RaidBattery
Type: equipment
Fault Code: F0982
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"
Explanation:
This fault indicates a consistency check failure with the virtual drive.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Initiate a consistency check on the virtual drive.
Step 2 Replace any faulty physical drives.
Fault Details:
Severity: warning
Cause: equipment-degraded
mibFaultCode: 982
mibFaultName: fltStorageVirtualDriveConsistencyCheckFailed
moClass: storage:VirtualDrive
Type: equipment
Fault Code: F1008
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"
Explanation:
This fault indicates a recoverable error with the virtual drive.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Initiate a consistency check on the virtual drive.
Step 2 Replace any faulty physical drives.
Fault Details:
Severity: warning
Cause: equipment-degraded
mibFaultCode: 1008
mibFaultName: fltStorageVirtualDriveDegraded
moClass: storage:VirtualDrive
Type: equipment
Fault Code: F1007
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"
Explanation:
This fault indicates a non-recoverable error with the virtual drive.
Recommended Action:
If you see this fault, take the following actions:
Step 1 If the data on the drive is accessible, back up and recreate the virtual drive.
Step 2 Replace any faulty physical drives.
Step 3 Check for controller errors in the MegaRAID ROM page logs.
Fault Details:
Severity: major
Cause: equipment-inoperable
mibFaultCode: 1007
mibFaultName: fltStorageVirtualDriveInoperable
moClass: storage:storage:VirtualDrive
Type: equipment
Fault Code: F0981
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d/vd-%d"
Explanation:
This fault indicates a failure in the reconstruction process of the virtual drive.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Restart the reconstruction process.
Fault Details:
Severity: warning
Cause: equipment-degraded
mibFaultCode: 981
mibFaultName: fltStorageVirtualDriveReconstructionFailed
moClass: storage:VirtualDrive
Type: equipment
Fault Code: F0976
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d"
Explanation:
This fault indicates a non-recoverable storage controller failure. This happens when the storage system cannot contact the controller for a period of time, after which it gives up, and raises this fault.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Reseat or replace the storage controller.
Fault Details:
Severity: warning
Cause: equipment-inoperable
mibFaultCode: 976
mibFaultName: fltStorageControllerInoperable
moClass: storage:Controller
Type: equipment
Fault Code: F1003
Message:
"sys/rack-unit-1/board/storage-%s-ctlr-%d"
Explanation:
This fault indicates that the review of the storage system for potential physical disk errors has failed.
Recommended Action:
If you see this fault, take the following actions:
Step 1 Initiate a consistency check on the virtual drive.
Step 2 Replace any faulty physical drives.
Fault Details:
Severity: warning
Cause: equipment-inoperable
mibFaultCode: 1003
mibFaultName: fltStorageControllerPatrolReadFailed
moClass: storage:Controller
Type: equipment
Fault Code: F0461
Message:
Log capacity on Management Controller on server [id] is [capacity]
Explanation
This fault typically occurs because Cisco Integrated Management Controller (CIMC) has detected that the system event log (SEL) on the server is almost full. The available capacity in the log is very low. This is an info-level fault and can be ignored if you do not want to clear the SEL at this time.
Recommended Action
If you see this fault, you can clear the SEL, if desired.
Fault Details:
Severity: info
Cause: log-capacity
mibFaultCode: 461
mibFaultName: fltSysdebugMEpLogMEpLogVeryLow
moClass: sysdebug:MEpLog
Type: operational
Fault Code: F0462
Message:
Log capacity on Management Controller on server [id] is [capacity]
Explanation
This fault typically occurs because Cisco CIMC could not transfer the SEL file to the location specified in the SEL policy. This is an info-level fault and can be ignored if you do not want to clear the SEL at this time.
Recommended Action
If you see this fault, take the following actions:
Step 1 Verify the configuration of the SEL policy to ensure that the location, user, and password provided are
correct.
Step 2 If you do want to transfer and clear the SEL and the above action did not resolve the issue, create a tech-support file and contact Cisco TAC.
Fault Details
Severity: info
Cause: log-capacity
mibFaultCode: 462
mibFaultName: fltSysdebugMEpLogMEpLogFull
moClass: sysdebug:MEpLog
Type: operational