PDSI Talks & Publications
Journals
Failure Tolerance in Petascale Computers.
Garth Gibson,
Bianca Schroeder,
Joan Digney. CTWatch Quarterly,
vol. 3 no. 4. Volume on Software Enabling Technologies for Petascale Science.
November 2007. www.ctwatch.org
PDF
Understanding Failures in Petascale Computers.
Bianca Schroeder, Garth A. Gibson.
SciDAC 2007. Journal of Physics: Conference Series 78 (2007) 012022.
Abstract / PDF / Permanent JPCS Link
All 100 open access volumes of the Journal of Physics Conference Series (JPCS)are available via the journal home page: http://herald.iop.org/JPCS_home/m294/crk//link/1520
Understanding Disk Failure Rates: What does an MTTF of 1,000,000 hours mean to you? Bianca Schroeder, Garth A. Gibson. ACM Transactions on Storage (TOS), Volume 3 Issue 3, October 2007.
A Replicated File System for Grid Computing. Jiaying Zhang and Peter Honeyman. Concurrency and Computation: Practice and Experience, 2007; 00:1–7.
Abstract / PDF
Data Management: the Victorian Era Child of the 21st Century. Farber R., PNNL-SA-53343, Pacific Northwest National Laboratory, Richland, WA. Published in Scientific Computing, vol. 24 no.4, March 2007.
HTML
Balancing Computation and Experiment. Farber R. PNNL-SA-54125, Pacific Northwest National Laboratory, Richland, WA. Published in Innovation: America's Journal of Technology Commercialization, vol. 5 no. 24, April/May 2007.
HTML
Early Experiences on the Journey Towards Self-* Storage. Michael Abd-El-Malek, William V. Courtright II, Chuck Cranor, Gregory R. Ganger, James Hendricks, Andrew J. Klosterman, Michael Mesnier, Manish Prasad, Brandon Salmon, Raja R. Sambasivan, Shafeeq Sinnamohideen, John D. Strunk, Eno Thereska, Matthew Wachs, Jay J. Wylie. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, September 2006.
Abstract / PDF
Conferences
On Application-level Approaches to Avoiding TCP Throughput Collapse in Cluster-Based Storage Systems. E. Krevat, V. Vasudevan, A. Phanishayee, D. Andersen, G. Ganger, G. Gibson, S. Seshan. Proceedings of the 2nd international Petascale Data Storage Workshop (PDSW '07)
held in conjunction with Supercomputing '07. November 11, 2007, Reno, NV.
Abstract / PDF
GIGA+: Scalable Directories for Shared File Systems. Swapnil V. Patil, Garth A. Gibson, Sam Lang, Milo Polte. Proceedings of the 2nd international Petascale Data Storage Workshop (PDSW '07)
held in conjunction with Supercomputing '07. November 11, 2007, Reno, NV.
Abstract / PDF
Accelerating Reed-Solomon Coding in RAID Systems with GPUs.
Matthew Curry (University of Alabama at Birmingham, USA); Lee Ward (Sandia National Laboratories, USA); Tony Skjellum (University of Alabama Birmingham, USA); Ron Brightwell (Sandia National Laboratories, USA). 22nd IEEE International Parallel and Distributed Processing Symposium,
April 14-18, 2008,
Miami, FL.
Abstract / PDF
An Analysis of Data Corruption in the Storage Stack. L. Bairavasundaram, G. Goodson, B. Schroeder, A. Arpaci-Dusseau, R. Arpaci-Dusseau, 6th Usenix Conference on File and Storage Technologies (FAST 2008).
Abstract / PDF
Scalable Security for Petascale Parallel File Systems. Andrew Leung, Ethan L. Miller, and Stephanie Jones., SC '07, Reno, NV, November 2007.
Abstract / PDF
POTSHARDS: Secure Long-Term Storage Without Encryption. Mark W. Storer, Kevin Greenan, Ethan L. Miller, Kaladhar Voruganti. Proceedings of the 2007 USENIX Technical Conference, June 2007.
Abstract / PDF
PRIMS : Making NVRAM Suitable for Extremely Reliable Storage. Kevin Greenan, Ethan L. Miller. Proceedings of the 3rd Workshop on Hot Topics in System Dependability (HotDep '07), June 2007.
Abstract / PDF
Direct-pNFS: Scalable, Transparent, and Versatile Access to Parallel File Systems. Dean Hildebrand, Peter Honeyman. Proc. 16th IEEE International Symp. on High Performance Distributed Computing (HPDC 2007), Monterey. June 2007.
Abstract / PDF
Modeling the Relative Fitness of Storage. Michael P. Mesnier, Matthew Wachs, Raja R. Sambasivan, Alice X. Zheng, Gregory R. Ganger. SIGMETRICS'07, June 12-16, 2007, San Diego, California, USA.ACM. Awarded Best Paper.
Abstract / PDF
Hierarchical Replication Control in a Global File System. Jiaying Zhang and Peter Honeyman. Proc. 7th IEEE International Symp. on Cluster Computing and the Grid (CCGrid07), Rio de Janeiro. May 2007.
Abstract / PDF
pNFS and Linux: Working towards a Heterogeneous Future. Dean Hildebrand, Peter Honeyman, and W.A. (Andy) Adamson. Proc. 8th LCI International Conf. on High-Performance Clustered Computing, South Lake Tahoe. May 2007.
Abstract / PDF
Fingerpointing Correlated Failures in Replicated Systems. Soila Pertet, Rajeev Gandhi and Priya Narasimhan. USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML), Cambridge, MA. April 2007.
Abstract / PDF
MultiMap: Preserving Disk Locality for Multidimensional Datasets. Minglong Shao, Steven W. Schlosser, Stratos Papadomanolakis, Jiri Schindler, Anastassia Ailamaki, Gregory R. Ganger. IEEE 23rd International Conference on Data Engineering (ICDE 2007) Istanbul, Turkey, April 2007. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-05-102. March 2005.
Abstract / PDF
The Computer Failure Data Repository. Bianca Schroeder, Garth Gibson. Invited contribution to the Workshop on Reliability Analysis of System Failure Data (RAF'07) MSR Cambridge, UK, March 2007.
Abstract / PDF
//TRACE: Parallel Trace Replay with Approximate Causal Events. Michael Mesnier, Matthew Wachs, Raja R. Sambasivan, Julio Lopez, James Hendricks, Gregory R. Ganger. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13-16, 2007, San Jose, CA. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-06-108, September 2006.
Abstract / PDF
Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You? Bianca Schroeder, Garth A. Gibson. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13--16, 2007, San Jose, CA. Best Paper Award.
Abstract / PDF
A Large Scale Study of Failures in High-performance-computing Systems. Bianca Schroeder, Garth Gibson. International Symposium on Dependable Systems and Networks (DSN 2006). IEEE Transactions on Dependable and Secure Computing (TDSC).
Abstract / PDF
Argon: Performance Insulation for Shared Storage Servers. Matthew Wachs, Michael Abd-El-Malek, Eno Thereska, Gregory R. Ganger. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13--16, 2007, San Jose, CA.
Abstract / PDF
Towards Fingerpointing in the Emulab Dynamic Distributed System. Michael P. Kasick, Priya Narasimhan, Kevin Atkinson, Jay Lepreau. Proceedings of the 3rd USENIX Workshop on Real, Large Distributed Systems (WORLDS '06), Seattle, WA. Nov. 5, 2006.
Abstract / PDF
NFSv4 Replication for Grid Storage Middleware. Jiaying Zhang and Peter Honeyman. Proc. 4th International Workshop on Middleware for Grid Computing, Melbourne. November 2006.
Abstract / PDF
Ceph: A Scalable, High-Performance Distributed File System. Sage Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, Carlos Maltzahn. Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06), November 2006.
Abstract / PDF
CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data. Sage Weil, Scott A. Brandt, Ethan L. Miller, Carlos Maltzahn. Proceedings of SC '06, November 2006.
Abstract / PDF
Reliability Mechanisms for File Systems Using Non-Volatile Memory as a Metadata Store. Kevin Greenan, Ethan L. Miller, Proceedings of the 6th ACM & IEEE Conference on Embedded Software (EMSOFT '06), October 2006, pages 178-187.
Abstract / PDF
Scalable Security for Large, High Performance Storage Systems. Andrew Leung, Ethan L. Miller. Proceedings of the 2nd ACM Workshop on Storage Security and Survivability (StorageSS 2006), October 2006.
Abstract / PDF
Other
Network Transparency in Wide Area Collaborations. Jiaying Zhang. Ph.D. Dissertation, University of Michigan, Ann Arbor, May 2007.
Abstract / PDF
Distributed Access to Parallel File Systems. Dean Hildebrand. Ph.D. Dissertation, University of Michigan, Ann Arbor, February 2007.
Abstract / PDF
Posters
PDSI Shared Information Resources for HEC Storage. PDSI PIs. ASCR PI meeting, March 31, 2008, Denver, CO.
PDF
PDSI Data Releases and Repositories. PDSI PIs. 6th USENIX Conference on File and Storage Technologies (FAST '08). Feb. 26-29, 2008. San Jose, CA.
PDF
Talks
Petascale Data Storage Institute - Access Methods. Garth Gibson, Carnegie Mellon University. SDM-PDSI Mini Workshop. Nov 30, 2007. Seattle, WA
PDF [495K]
Performance Challenges for Extreme Scale Computing. John T. Daly,
Los Alamos National Lab. SDI Seminar, Carnegie Mellon University.
PDF [4.6M]
Understanding Failure in Petascale Computers. Garth Gibson (Joint work with Bianca Schroeder).
2007 SciDAC Conference, June 25, Boston MA.
PDF [899K]
Design and Expressions of a Scalable Supercomputer. Lee Ward. Sandia’s MPP, 10/31/2006.
PDF [673K]








