Monitoring Hardware
This chapter includes the following sections:
- Monitoring Fan Modules
- Monitoring Management Interfaces
- Local Storage Monitoring
- Graphics Card Server Support
- Managing Transportable Flash Module and Supercapacitor
- TPM Monitoring
Monitoring Fan Modules
The following example displays information about the fan modules in chassis 1:
UCS-A# scope chassis 1 UCS-A /chassis # show environment fan Chassis 1: Overall Status: Power Problem Operability: Operable Power State: Redundancy Failed Thermal Status: Upper Non Recoverable Tray 1 Module 1: Threshold Status: OK Overall Status: Operable Operability: Operable Power State: On Thermal Status: OK Voltage Status: N/A Fan Module Stats: Ambient Temp (C): 25.000000 Fan 1: Threshold Status: OK Overall Status: Operable Operability: Operable Power State: On Thermal Status: OK Voltage Status: N/A Fan 2: Threshold Status: OK Overall Status: Operable Operability: Operable Power State: On Thermal Status: OK Voltage Status: N/A Tray 1 Module 2: Threshold Status: OK Overall Status: Operable Operability: Operable Power State: On Thermal Status: OK Voltage Status: N/A Fan Module Stats: Ambient Temp (C): 24.000000 Fan 1: Threshold Status: OK Overall Status: Operable Operability: Operable Power State: On Thermal Status: OK Voltage Status: N/A Fan 2: Threshold Status: OK Overall Status: Operable Operability: Operable Power State: On Thermal Status: OK Voltage Status: N/A
The following example displays information about fan module 2 in chassis 1:
UCS-A# scope chassis 1 UCS-A /chassis # scope fan-module 1 2 UCS-A /chassis/fan-module # show detail Fan Module: Tray: 1 Module: 2 Overall Status: Operable Operability: Operable Threshold Status: OK Power State: On Presence: Equipped Thermal Status: OK Product Name: Fan Module for UCS 5108 Blade Server Chassis PID: N20-FAN5 VID: V01 Vendor: Cisco Systems Inc Serial (SN): NWG14350B6N HW Revision: 0 Mfg Date: 1997-04-01T08:41:00.000
Monitoring Management Interfaces
Management Interfaces Monitoring Policy
This policy defines how the mgmt0 Ethernet interface on the fabric interconnect should be monitored. If Cisco UCS detects a management interface failure, a failure report is generated. If the configured number of failure reports is reached, the system assumes that the management interface is unavailable and generates a fault. By default, the management interfaces monitoring policy is enabled.
If the affected management interface belongs to a fabric interconnect which is the managing instance, Cisco UCS confirms that the subordinate fabric interconnect's status is up, that there are no current failure reports logged against it, and then modifies the managing instance for the endpoints.
If the affected fabric interconnect is currently the primary inside of a high availability setup, a failover of the management plane is triggered. The data plane is not affected by this failover.
You can set the following properties related to monitoring the management interface:
-
Type of mechanism used to monitor the management interface.
-
Interval at which the management interface's status is monitored.
-
Maximum number of monitoring attempts that can fail before the system assumes that the management is unavailable and generates a fault message.
Configuring the Management Interfaces Monitoring Policy
Step 1 | Enter monitoring mode.
UCS-A# scope monitoring |
Step 2 | Enable or disable the management interfaces monitoring policy.
UCS-A /monitoring # set mgmt-if-mon-policy admin-state {enabled | disabled} |
Step 3 | Specify the number of seconds that the system should wait between data recordings.
UCS-A /monitoring # set mgmt-if-mon-policy poll-interval Enter an integer between 90 and 300. |
Step 4 | Specify the maximum number of monitoring attempts that can fail before the system assumes that the management interface is unavailable and generates a fault message.
UCS-A /monitoring # set mgmt-if-mon-policy max-fail-reports num-mon-attempts Enter an integer between 2 and 5. |
Step 5 | Specify the monitoring mechanism that you want the system to use. UCS-A /monitoring #
set mgmt-if-mon-policy monitor-mechanism
{mii-status | ping-arp-targets | ping-gateway
|
Step 6 | If you selected mii-status as your monitoring mechanism, configure the following properties:
|
Step 7 | If you selected ping-arp-targets as your monitoring mechanism, configure the following properties:
|
Step 8 | If you selected ping-gateway as your monitoring mechanism, configure the following properties:
|
Step 9 | UCS-A /monitoring #
commit-buffer
Commits the transaction to the system configuration. |
The following example creates a monitoring interface management policy using the Media Independent Interface (MII) monitoring mechanism and commits the transaction:
UCS-A# scope monitoring UCS-A /monitoring # set mgmt-if-mon-policy admin-state enabled UCS-A /monitoring* # set mgmt-if-mon-policy poll-interval 250 UCS-A /monitoring* # set mgmt-if-mon-policy max-fail-reports 2 UCS-A /monitoring* # set mgmt-if-mon-policy monitor-mechanism set mii-status UCS-A /monitoring* # set mgmt-if-mon-policy mii-retry-count 3 UCS-A /monitoring* # set mgmt-if-mon-policy mii-retry-interval 7 UCS-A /monitoring* # commit-buffer UCS-A /monitoring #
Local Storage Monitoring
Local storage monitoring in Cisco UCS provides status information on local storage that is physically attached to a blade or rack server. This includes RAID controllers, physical drives and drive groups, virtual drives, RAID controller batteries (BBU), Transportable Flash Modules (TFM) and super-capacitors, FlexFlash controllers, and SD cards.
Cisco UCS Manager communicates directly with the LSI MegaRAID controllers and FlexFlash controllers using an out-of-band (OOB) interface, which enables real-time updates. Some of the information that is displayed includes:
-
RAID controller status and rebuild rate.
-
The drive state, power state, link speed, operability and firmware version of physical drives.
-
The drive state, operability, strip size, access policies, drive cache, and health of virtual drives.
-
The operability of a BBU, whether it is a supercap or battery, and information about the TFM.
LSI storage controllers use a Transportable Flash Module (TFM) powered by a super-capacitor to provide RAID cache protection.
-
Information on SD cards and FlexFlash controllers, including RAID health and RAID state, card health, and operability.
-
Information on operations that are running on the storage component, such as rebuild, initialization, and relearning.
Note
After a CIMC reboot or build upgrades, the status, start time, and end times of operations running on the storage component might not be displayed correctly.
-
Detailed fault information for all local storage components.
Note
All faults are displayed on the Faults tab.
- Support for Local Storage Monitoring
- Prerequisites for Local Storage Monitoring
- Legacy Disk Drive Monitoring
- Flash Life Wear Level Monitoring
- Viewing Flash Life Status
- Viewing the Status of Local Storage Components
- Viewing the Status of a Disk Drive
- Viewing RAID Controller Operations
Support for Local Storage Monitoring
The type of monitoring supported depends upon the Cisco UCS server.
Supported Cisco UCS Servers for Local Storage Monitoring
Through Cisco UCS Manager, you can monitor local storage components for the following servers:
-
Cisco UCS B200 M3 blade server
-
Cisco UCS B420 M3 blade server
-
Cisco UCS B22 M3 blade server
-
Cisco UCS B200 M4 blade server
-
Cisco UCS B260 M4 blade server
-
Cisco UCS B460 M4 blade server
-
Cisco UCS C460 M2 rack server
-
Cisco UCS C420 M3 rack server
-
Cisco UCS C260 M2 rack server
-
Cisco UCS C240 M3 rack server
-
Cisco UCS C220 M3 rack server
-
Cisco UCS C24 M3 rack server
-
Cisco UCS C22 M3 rack server
-
Cisco UCS C220 M4 rack server
-
Cisco UCS C240 M4 rack server
-
Cisco UCS C460 M4 rack server
Note | Not all servers support all local storage components. For Cisco UCS rack servers, the onboard SATA RAID 0/1 controller integrated on motherboard is not supported. |
Supported Cisco UCS Servers for Legacy Disk Drive Monitoring
Only legacy disk drive monitoring is supported through Cisco UCS Manager for the following servers:
Note | In order for Cisco UCS Manager to monitor the disk drives, the 1064E storage controller must have a firmware level contained in a UCS bundle with a package version of 2.0(1) or higher. |
Prerequisites for Local Storage Monitoring
These prerequisites must be met for local storage monitoring or legacy disk drive monitoring to provide useful status information:
Legacy Disk Drive Monitoring
Note | The following information is applicable only for B200 M1/M2 and B250 M1/M2 blade servers. |
The legacy disk drive monitoring for Cisco UCS provides Cisco UCS Manager with blade-resident disk drive status for supported blade servers in a Cisco UCS domain. Disk drive monitoring provides a unidirectional fault signal from the LSI firmware to Cisco UCS Manager to provide status information.
The following server and firmware components gather, send, and aggregate information about the disk drive status in a server:
-
Physical presence sensor—Determines whether the disk drive is inserted in the server drive bay.
-
Physical fault sensor—Determines the operability status reported by the LSI storage controller firmware for the disk drive.
-
IPMI disk drive fault and presence sensors—Sends the sensor results to Cisco UCS Manager.
-
Disk drive fault LED control and associated IPMI sensors—Controls disk drive fault LED states (on/off) and relays the states to Cisco UCS Manager.
Flash Life Wear Level Monitoring
Flash life wear level monitoring enables you to monitor the life span of solid state drives. You can view both the percentage of the flash life remaining, and the flash life status. Wear level monitoring is supported on the Fusion IO mezzanine card with the following Cisco UCS blade servers:
-
Cisco UCS B22 M3 blade server
-
Cisco UCS B200 M3 blade server
-
Cisco UCS B420 M3 blade server
-
Cisco UCS B200 M4 blade server
-
Cisco UCS B260 M4 blade server
-
Cisco UCS B460 M4 blade server
Note | Wear level monitoring requires the following:
|
Viewing Flash Life Status
Command or Action | Purpose |
---|
The following example shows how to display the flash life status for server 3:
UCS-A# scope server 1/3 UCS-A /chassis/server # show raid-controller detail expand RAID Controller: ID: 1 Type: FLASH PCI Addr: 131:00.0 Vendor: Cisco Systems Inc Model: UCSC-F-FIO-1205M Serial: 1315D2B52 HW Rev: FLASH Raid Support: No OOB Interface Supported: No Rebuild Rate: N/A Controller Status: Unknown Flash Life: Flash Percentage: N/A FLash Status: Error(244) UCS-A /chassis/server #
Viewing the Status of Local Storage Components
Command or Action | Purpose |
---|
The following example shows how to display the local disk status for server 2:
UCS-A# scope server 1/2 UCS-A /chassis/server # show inventory storage Server 1/2: Name: User Label: Equipped PID: UCSB-B200-M3 Equipped VID: V01 Equipped Serial (SN): FCH16207KXG Slot Status: Equipped Acknowledged Product Name: Cisco UCS B200 M3 Acknowledged PID: UCSB-B200-M3 Acknowledged VID: V01 Acknowledged Serial (SN): FCH16207KXG Acknowledged Memory (MB): 98304 Acknowledged Effective Memory (MB): 98304 Acknowledged Cores: 12 Acknowledged Adapters: 1 Motherboard: Product Name: Cisco UCS B200 M3 PID: UCSB-B200-M3 VID: V01 Vendor: Cisco Systems Inc Serial (SN): FCH16207KXG HW Revision: 0 RAID Controller 1: Type: SAS Vendor: LSI Logic Symbios Logic Model: LSI MegaRAID SAS 2004 ROMB Serial: LSIROMB-0 HW Revision: B2 PCI Addr: 01:00.0 Raid Support: RAID0, RAID1 OOB Interface Supported: Yes Rebuild Rate: 31 Controller Status: Optimal Local Disk 1: Product Name: 146GB 6Gb SAS 10K RPM SFF HDD/hot plug/drive sled mounted PID: A03-D146GA2 VID: V01 Vendor: SEAGATE Model: ST9146803SS Vendor Description: Seagate Technology LLC Serial: 3SD31S4X HW Rev: 0 Block Size: 512 Blocks: 285155328 Operability: Operable Oper Qualifier Reason: N/A Presence: Equipped Size (MB): 139236 Drive State: Online Power State: Active Link Speed: 6 Gbps Device Type: HDD Local Disk 2: Product Name: 600G AL12SE SAS Hard Disk Drive PID: A03-D600GA2 VID: V01 Vendor: TOSHIBA Model: MBF2600RC Vendor Description: Toshiba Corporation Serial: EA00PB109T4A HW Rev: 0 Block Size: 512 Blocks: 1169920000 Operability: Operable Oper Qualifier Reason: N/A Presence: Equipped Size (MB): 571250 Drive State: Online Power State: Active Link Speed: 6 Gbps Device Type: HDD Local Disk Config Definition: Mode: RAID 1 Mirrored Description: Protect Configuration: No Virtual Drive 0: Type: RAID 1 Mirrored Block Size: 512 Blocks: 285155328 Operability: Operable Presence: Equipped Size (MB): 139236 Lifecycle: Allocated Drive State: Optimal Strip Size (KB): 64 Access Policy: Read Write Read Policy: Normal Configured Write Cache Policy: Write Through Actual Write Cache Policy: Write Through IO Policy: Direct Drive Cache: No Change Bootable: False UCS-A /chassis/server #
Viewing the Status of a Disk Drive
Command or Action | Purpose | |
---|---|---|
Step 1 | UCS-A# scope chassis chassis-num |
Enters chassis mode for the specified chassis. |
Step 2 | UCS-A /chassis # scope server server-num |
Enters server chassis mode. |
Step 3 | UCS-A /chassis/server # scope raid-controller raid-contr-id {sas | sata} |
Enters RAID controller server chassis mode. |
Step 4 | UCS-A /chassis/server/raid-controller # show local-disk [local-disk-id | detail | expand] |
The following example shows the status of a disk drive:
UCS-A# scope chassis 1 UCS-A /chassis # scope server 6 UCS-A /chassis/server # scope raid-controller 1 sas UCS-A /chassis/server/raid-controller # show local-disk 1 Local Disk: ID: 1 Block Size: 512 Blocks: 60545024 Size (MB): 29563 Operability: Operable Presence: Equipped
Viewing RAID Controller Operations
Command or Action | Purpose |
---|
The following example shows how to display the RAID controller operations for server 3:
UCS-A# scope server 1/3 UCS-A /chassis/server # show raid-controller operation Name: Rebuild Affected Object: sys/chassis-1/blade-3/board/storage-SAS-1/disk-1 State: In Progress Progress: 4 Start Time: 2013-11-05T12:02:10.000 End Time: N/A UCS-A /chassis/server #
Graphics Card Server Support
With Cisco UCS Manager, you can view the properties for certain graphics cards and controllers. Graphics cards are supported on the following servers:
Note | Certain NVIDIA Graphics Processing Units (GPU) do not support Error Correcting Code (ECC) and vGPU together. Cisco recommends that you refer to the release notes published by NVIDIA for the respective GPU to know whether it supports ECC and vGPU together. |
Viewing Graphics Card Properties
Command or Action | Purpose |
---|
The following example shows how to display the graphics card properties on server 1:
UCS-A# scope server 1 UCS-A /server # show graphics-card Graphics Card: ID Slot Id Is Supported Firmware Version --- ---------- ------------ ---------------- 1 5 Yes 80.07.6D.00.13|2401.0502.00.02 UCS-A /server # show graphics-card detail Graphics Card: ID: 1 Slot Id: 5 Is Supported: Yes Vendor: nVidia Corporation Model: Nvidia GRID K1 P2401-502 Serial: NA Firmware Version: 80.07.6D.00.13|2401.0502.00.02 UCS-A /server #
Viewing Graphics Controller Properties
Command or Action | Purpose | |
---|---|---|
Step 1 | UCS-A# scope server blade-id |
Enters server mode for the specified server. |
Step 2 | UCS-A /server # scope graphics-card card-id |
Enters graphics card mode for the specified graphics card. |
Step 3 | UCS-A /server/graphics-card # show graphics-controller detail |
Displays information about the graphics controllers. |
The following example shows how to display the graphics controller properties for graphics card 1 on server 1:
UCS-A# scope server 1 UCS-A /server # scope graphics-card 1 UCS-A /server/graphics-card # show graphics-controller detail Graphics Controller: ID: 1 Pci Address: 07:00.0 ID: 2 Pci Address: 08:00.0 UCS-A /server/graphics-card #
Managing Transportable Flash Module and Supercapacitor
LSI storage controllers use a Transportable Flash Module (TFM) powered by a supercapacitor to provide RAID cache protection. With Cisco UCS Manager, you can monitor these components to determine the status of the battery backup unit (BBU). The BBU operability status can be one of the following:
-
Operable—The BBU is functioning successfully.
-
Inoperable—The TFM or BBU is missing, or the BBU has failed and needs to be replaced.
-
Degraded—The BBU is predicted to fail.
TFM and supercap functionality is supported beginning with Cisco UCS Manager Release 2.1(2).
TFM and Supercap Guidelines and Limitations
TFM and Supercap Limitations
-
The CIMC sensors for TFM and supercap on the Cisco UCS B420 M3 blade server are not polled by Cisco UCS Manager.
-
If the TFM and supercap are not installed on the Cisco UCS B420 M3 blade server, or are installed and then removed from the blade server, no faults are generated.
-
If the TFM is not installed on the Cisco UCS B420 M3 blade server, but the supercap is installed, Cisco UCS Manager reports the entire BBU system as absent. You should physically check to see if both the TFM and supercap is present on the blade server.
Supported Cisco UCS Servers for TFM and Supercap
The following Cisco UCS servers support TFM and supercap:
Monitoring RAID Battery Status
This procedure applies only to Cisco UCS servers that support RAID configuration and TFM. If the BBU has failed or is predicted to fail, you should replace the unit as soon as possible.
Command or Action | Purpose | |
---|---|---|
Step 1 | UCS-A # scope chassis chassis-num |
Enters chassis mode for the specified chassis. |
Step 2 | UCS-A /chassis #scope server server-num |
Enters server chassis mode. |
Step 3 | UCS-A /chassis/server # scope raid-controller raid-contr-id {flash | sas | sata | sd | unknown} |
Enters RAID controller server chassis mode. |
Step 4 | UCS-A /chassis/server/raid-controller # show raid-battery expand |
Displays the RAID battery status. |
This example shows how to view information on the battery backup unit of a server:
UCS-A # scope chassis 1 UCS-A /chassis #scope server 3 UCS-A /chassis/server #scope raid-controller 1 sas UCS-A /chassis/server/raid-controller # show raid-battery expand RAID Battery: Battery Type: Supercap Presence: Equipped Operability: Operable Oper Qualifier Reason: Vendor: LSI Model: SuperCaP Serial: 0 Capacity Percentage: Full Battery Temperature (C): 54.000000 Transportable Flash Module: Presence: Equipped Vendor: Cisco Systems Inc Model: UCSB-RAID-1GBFM Serial: FCH164279W6
TPM Monitoring
Trusted Platform Module (TPM) is included on all Cisco UCS M3 blade and rack-mount servers. Operating systems can use TPM to enable encryption. For example, Microsoft's BitLocker Drive Encryption uses the TPM on Cisco UCS servers to store encryption keys.
Cisco UCS Manager enables monitoring of TPM, including whether TPM is present, enabled, or activated.
Viewing TPM Properties
Command or Action | Purpose | |
---|---|---|
Step 1 | UCS-A# scope server chassis-id / server-id |
Enters chassis server mode for the specified server. |
Step 2 | UCS-A /chassis/server # scope tpm tpm-id |
Enters TPM mode for the specified TPM ID. |
Step 3 | UCS-A /chassis/server/tpm # show |
Displays the TPM properties. |
Step 4 | UCS-A /chassis/server/tpm # show detail |
Displays detailed TPM properties. |
The following example shows how to display the TPM properties for blade 3 in chassis 1:
UCS-A# scope server 1/3 UCS-A /chassis/server # scope tpm 1 UCS-A /chassis/server/tpm # show Trusted Platform Module: Presence: Equipped Enabled Status: Enabled Active Status: Activated Ownership: Unowned UCS-A /chassis/server/tpm # show detail Trusted Platform Module: Enabled Status: Enabled Active Status: Activated Ownership: Unowned Tpm Revision: 1 Model: UCSX-TPM1-001 Vendor: Cisco Systems Inc Serial: FCH16167DBJ UCS-A /chassis/server/tpm #