Puncturing bad block on PD - Punctured array information

Available Languages

Download Options

PDF (75.2 KB)
View with Adobe Reader on a variety of devices
ePub (74.0 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (68.1 KB)
View on Kindle device or Kindle app on multiple devices

Updated:January 23, 2019

Document ID:211054

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

How do Punctured Blocks Happen?

Punctured Block Symptoms

Evidence of a Punctured Block

Possible Remediation

Preventing Punctured Blocks

Introduction

This document describes the meaning of a Punctured Block on a hard drive. It also describes how a Punctured Block occurs and the remediation steps.

What is a Punctured Block?

When a Patrol Read or a Rebuild operation encounters a media error on the source drive, it punctures a block on the target drive to prevent the use of the data with the invalid parity. Any subsequent read operation to the punctured block completes, but with an error. Consequently, the puncturing of a block prevents any invalid parity generation later while using this block.

Source: 12Gb/s MegaRAID® SAS Software User Guide, Rev. F, August 2014

How do Punctured Blocks Happen?

In RAID5, the data is distributed in the form of parity across all the member disks. In this case, if one of the drives goes bad, the data can be rebuilt by calculating the parity across all the drive. There are several things which can cause a puncture, but it usually starts with a RAID that has a single failed drive that also has a drive with many medium errors or in a Predictive Failure state.

The following link provides a very good scenario where it explains how an array can get punctured:

http://www.theprojectbot.com/what-is-a-punctured-raid-array

After reading it, you should have a clear idea that when a hard disk is replaced without checking the other disks, some bad logical blocks or medium errors were relocated, and then any of the other disks may show up as failed.

A punctured block can potentially occur on multiple drives, with only 1 drive officially "failing." This can then be replicated to replacement disks, further compounding the issue.

Punctured Block Symptoms

The server may report multiple hard drive failures. Simply replacing the hard drive will NOT fix the issue. In addition, I/O performance may be degraded.

Evidence of a Punctured Block

The logs may contain entries similar to the lines below.

6:2014 Jul 27 00:36:06:BMC:storage:-: SLOT-5: Unexpected sense: PD 0c(e0x12/s5) Path 500000e11986c502, CDB: 28 00 0e 71 66 e7 00 00 19 00, Sense: 3/11/01
6:2014 Jul 27 00:36:06:BMC:storage:-: SLOT-5: Unexpected sense: PD 13(e0x12/s7) Path 50000395083063f6, CDB: 28 00 0e 71 66 eb 00 00 15 00, Sense: 3/11/14

In the above output, e0x12/s5 indicates it relates to HDD5. The following link describes the meaning of the sense code (Sense: 3/11/14):

http://en.wikipedia.org/wiki/Key_Code_Qualifier

Therefore, that sensor indicates medium errors.

The following events could also be prevent in the logs:

1:2014 Jul 16 10:42:43:BMC:storage:-: SLOT-5: Unrecoverable medium error during recovery on PD 0c(e0x12/s5) at e7166e7
1:2014 Jul 16 10:42:43:BMC:storage:-: SLOT-5: Puncturing bad block on PD 0c(e0x12/s5) at e7166e7
1:2014 Jul 19 03:46:22:BMC:storage:-: SLOT-5: Consistency Check detected uncorrectable multiple medium errors (PD 13(e0x12/s7) at e7166d9 on (null))

Possible Remediation

Anytime punctured blocks present themselves, data backups are highly recommended. When presented with the messages mentioned above, the inclination may be to look for the actual failing hard drive and replace it, however, there is a chance that multiple bad logical blocks were spread across the array. Although failed or failing hard drive(s) may have been the cause, punctured blocks will only be resolved by reconstructing the affected virtual drive(s).

Create a data backup
Erase the RAID array configuration
Create a new array from scratch
Note: Note: While creating the VD (Virtual Drive), select FULL/SLOW initiatization instead of FAST initialization.
Reinstall the operating system
Restore the data backup.

Note: Replacing hard drives will NOT fix punctured blocks by itself. If there is a failed drive, it should be replaced, otherwise the RAID needs to be rebuilt.

Preventing Punctured Blocks

Monitor RAIDs and the health of their member drives.
Prior to replacing any hard drives, review controller logs.
Ensure Patrol Reads and Consisency Checks are turned on and running (Check against bug CSCul22968).

Contributed by Cisco Engineers

Zaira Vega
TAC Engineer
Kevin McCabe
Technical Leader

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

UCS C-Series Rack Servers

Puncturing bad block on PD - Punctured array information

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

How do Punctured Blocks Happen?

Punctured Block Symptoms

Evidence of a Punctured Block

Possible Remediation

Preventing Punctured Blocks

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products