ABSTRACT
An Analysis of Data Corruption in the Storage Stack
L. Bairavasundaram, G. Goodson, B. Schroeder, A. Arpaci-Dusseau, R. Arpaci-Dusseau.
6th Usenix Conference on File and Storage Technologies (FAST 2008).
An important threat to reliable storage of data is silent
data corruption. In order to develop suitable protection
mechanisms against data corruption, it is essential to understand
its characteristics. In this paper, we present the
first large-scale study of data corruption. We analyze corruption
instances recorded in production storage systems
containing a total of 1.53 million disk drives, over a period
of 41 months. We study three classes of corruption:
checksum mismatches, identity discrepancies, and parity
inconsistencies. We focus on checksum mismatches
since they occur the most.
We find more than 400,000 instances of checksum
mismatches over the 41-month period. We find many
interesting trends among these instances including: (i)
nearline disks (and their adapters) develop checksum
mismatches an order of magnitudemore often than enterprise
class disk drives, (ii) checksum mismatches within
the same disk are not independent events and they show
high spatial and temporal locality, and (iii) checksum
mismatches across different disks in the same storage
system are not independent. We use our observations to
derive lessons for corruption-proof system design.








