Objective
The endurance of Application Policy Infrastructure Controller (APIC) Solid State Drives (SSDs) is worn out over the course of high usage for specific type of SSDs. This leads to slow SSD writes, and the SSD can become read-only. When the SSD drive is degraded, it can cause CPU spikes in APIC services.
Field Notice: FN - 64329 recommends that all APIC SSDs with product ID APIC-SD120G0KS2-EV and / or APIC-SD120GBKS4-EV should be replaced, regardless of percent utilized, with a new Enterprise level SSD - Part Number UCS-SD200G12S3-EP.
This document outlines the procedure on how identify the APIC SSD product ID and how to replace the SSD on the APICs affected by the field notice .
It will supplement the existing SSD replacement docs listed below
Cisco APIC SSD Replacement Release 3.x and Earlier
Cisco APIC SSD Replacement Release 4.x and Later
Common Symptoms
In ACI releases starting 2.3, there is also a fault generated in the APIC to let you know when you are getting close to an SSD Endurance issue.
F2730: fltEqptStorageWearout-Warning
F2731: fltEqptStorageWearout-Major
F2732: fltEqptStorageWearout-Critical
Example:
Fault F2730: "Storage unit /dev/sdb on Node x mounted at /dev/sdb has x% life remaining [This fault will provide the SSD serial number ]”.
Fault F2730
This specific SSD endurance issue exists in two types of SSD which have product ID APIC-SD120G0KS2-EV and/or APIC-SD120GBKS4-EV.
Cisco recommends that you replace these SSDs, regardless of percent utilized, with a new Enterprise level SSD.
Are your APIC SSDs affected - How to Check?
To Identify if the APIC SSD product ID is affected by the field notice, get the SSD SN from the CIMC GUI.
For CIMC 3.0(3) or newer
Log in to Cisco IMC GUI.
a- Expand the CIMC menu with the Toggle Navigation (top left corner), Storage, Cisco 12G SAS Modular Raid Controller
b - Click On Physical Drive Info
c - On the left side, Physical drives, select PD-1 (it should be the SSD)
d - General, Media Type should be SSD
e - Inquiry Data, Drive Serial Number and copy the serial number
f - Paste the SSD serial number in the following website and check if the SSD serial number matches the affected Product ID
g- You can also check the "Percentage Life Left" from the screen below to show the usage.
https://cway.cisco.com/sncheck/
Cisco IMC 3.0(4d)
Or
For CIMC release prior 3.0(3)
Log in to Cisco IMC GUI.
a- Select Storage, Cisco UCSC RAID SAS 200xx
b - Click On Physical Drive Info
c - Select the SSD from the Physical Drives list
d - Inquiry Data, Drive Serial Number and copy the serial number
e - Paste the SSD serial number in the following website and check if the SSD serial number matches the affected Product ID
https://cway.cisco.com/sncheck/
Cisco IMC 2.0(9c)
2 - If the APIC SSD SN matches the affected Product ID APIC-SD120G0KS2-EV and / or APIC-SD120GBKS4-EV, create a TAC case with the APIC SSD serial number and CDETS CSCvc84794
Check List prior to SSD replacement
1. If your Cisco IMC release is earlier than 2.0(9c), you must upgrade the Cisco IMC software before replacing the solid-state drive (SSD). Refer to the Cisco IMC release notes of the target Cisco IMC release to determine the recommended upgrade path from your current release to the target release. Every ACI release has a recommended Cisco IMC release in the ACI release notes. Follow the instructions in the current version of the Cisco Host Upgrade Utility (HUU) User Guide at this link to perform the upgrade.
2. In the Cisco IMC BIOS, verify that the Trusted Platform Module (TPM) state is set to "Enabled." Using the KVM console to access the BIOS settings, you can view and configure the TPM state under Advanced > Trusted Computing > TPM State.
APIC BIOS via Cisco IMC KVM
Note: APIC will fail to boot if the TPM state is "Disabled."
3. Obtain an ACI APIC .iso image from the Cisco Software Download site.
4. This procedure should only be performed when there is at least one APIC with a healthy SSD in the cluster, that is fully fit. If all the APIC controllers in the cluster have SSDs that have failed, open a case with the Cisco Technical Assistance Center (TAC). Below snapshot is from a cluster that has all APICs in fully fit state.
APIC GUI 4.1(2g)
5. After the APIC SSD replacement, the APIC will have to configured again and the following information will be needed [This information will be used in "SSD Replacement Procedure Step 4-d"]:
- Fabric name
- Number of controllers
- Controller ID
- IP address pool for tunnel endpoint addresses (TEP)
- IP address pool for bridge domain multicast address (GIPO)
- Management interface speed/duplex mode
- VLAN ID for infrastructure network
- IPv4/IPv6 addresses for the out-of-band management
- IPv4/IPv6 addresses of the default gateway
- Strong password check
Use Technote of the day: How to find what configuration values were used during the setup of APIC1?
SSD Replacement Procedure
Step 1
From another APIC in the cluster, decommission the APIC whose SSD is to be replaced.
a - On the menu bar, choose System > Controllers.
b - In the Navigation pane, expand Controllers > apic_controller_name > Cluster as Seen by Node. For the APIC_controller_name, specify an APIC controller that is not being decommissioned.
c - In the Work pane, verify that the Health State in the Active Controllers summary table indicates the cluster is Fully Fit before continuing.
d - In the same Work pane, select the controller to be decommissioned and click Actions > Decommission.
e - Click Yes. The decommissioned controller displays Unregistered in the Operational State column. The controller is then taken out of service and is no longer visible in the Work pane.
APIC GUI 4.1(2g)
Step 2
Physically remove the old SSD, if any, and add the new SSD.
Step 3
In the Cisco IMC, create a RAID volume using the newly installed SSD.
For CIMC release 3.0(3) or newer
a - Log in to Cisco IMC.
b - Expand the CIMC menu with the Toggle Navigation (top left corner), Storage, Cisco 12G SAS Modular Raid Controller
Cisco IMC 3.0(4d)
c - Click 'Clear Foreign Config' and select ok (if selectable)
d - Click Create Virtual Drive from Unused Physical Drives
Cisco IMC 3.0(4d)
e - RAID Level, select 0 from the drop-down list
f - Create Drive groups, Select the Physical Drive and move it to the Drive Groups
g - Virtual Drive Properties, select Create Virtual Drive
Cisco CIMC 3.0(4d)
h - Still in the Storage, Cisco 12G SAS Modular Raid Controller, select Virtual Drive info
i - Identify the Virtual drive with the RAID Level as RAID 0, select it, then click Initialize followed by Fast Initialize from the drop down and selecting Initialize VD
Cisco CIMC 3.0(4d)
For CIMC release prior 3.0(3)
a - Log in to Cisco IMC.
b - Choose Storage > Physical Drive. Select the newly added physical drive.
c - Choose Storage > Controller Drive Info, and click Clear Foreign Config (if selectable).
d - Click OK.
e - Choose Storage Controller Drive Info, and click Create Virtual Drive from Unused Physical Drives.
Cisco IMC 2.0(9c)
f - Select 0 from the Raid Level drop-down list.
g - Click Create Virtual Drive.
Cisco IMC 2.0(9c)
h - Select the newly created virtual drive and click Initialize.
i - Select the Initialize Type from the drop-down list and click Fast Initialize.
Cisco IMC 2.0(9c)
Step 4
In the Cisco IMC, install the APIC image using the virtual media. In this step, the SSD is partitioned and the APIC software is installed on the HDD.
NOTE: For a fresh install of Cisco APIC Release 4.x or later, see the Cisco APIC Installation, Upgrade, and Downgrade Guide.
a - Mount the APIC .iso image using the Cisco IMC vMedia functionality.
b - Boot or power cycle the APIC controller.
Cisco IMC 3.0(4d)
c - During the boot process press F6 to select the Cisco vKVM-Mapped vDVD as the one-time boot device. You may be required to enter the BIOS password. The default password is 'password'.
Cisco IMC 3.0(4d)
Cisco IMC 3.0(4d)
Cisco IMC 3.0(4d)
Cisco IMC 3.0(4d)
d - During the initial bringup, a configuration script runs. Follow the onscreen instructions to configure the initial settings of the APIC software. Use the information that was collected in check list before starting or use the Technote of the day: How to find what configuration values were used during the setup of APIC1?
Cisco IMC 3.0(4d)
e - After the installation is completed, un-map the virtual media mount.
Cisco IMC 3.0(4d)
Step 5
From an APIC in the cluster, commission the decommissioned APIC.
a - Select any other APIC that is part of the cluster. From the menu bar, choose System > Controllers.
b - In the Navigation pane, expand Controllers > apic_controller_name > Cluster as Seen by Node. For the apic_controller_name, specify any active controller that is part of the cluster.
c - From the Work pane, click the decommissioned controller that displays Unregistered in the Operational State column.
d - From the Work pane, click Actions > Commission.
e - In the Confirmation dialog box, click Yes.
APIC GUI 4.1(2g)
The commissioned controller displays the Health state as Fully-fit and the operational state as Available. The controller should now be visible in the Work pane.
Field Notices / Bug references
Field Notice: FN - 64329 - APIC SSD Degradation After High Percent Utilization of Solid State Drive - Hardware Upgrade Available
APIC SSD Degradation After High Percent Utilization of Solid State Drive | Fault F2730