PDSI Talks & Publications
NOTE: PDSW '08 papers now published with IEEE Xplore
Journals
Pergamum: Energy-efficient Archival Storage with Disk Instead of Tape. Storer, Mark W., Kevin Greenan, Ethan L. Miller, Kaladhar Voruganti. The USENIX Magazine 33(3), June 2008.
PDF
A Replicated File System for Grid Computing. Zhang, Jiaying and Peter Honeyman. Concurrency and Computation: Practice and Experience 20:9 (June 2008), pp. 1113–1130. DOI 10.1002/cpe.v20:9
Failure Tolerance in Petascale Computers.
Garth Gibson,
Bianca Schroeder,
Joan Digney. CTWatch Quarterly,
vol. 3 no. 4. Volume on Software Enabling Technologies for Petascale Science.
November 2007. www.ctwatch.org
PDF
Understanding Failures in Petascale Computers.
Bianca Schroeder, Garth A. Gibson.
SciDAC 2007. Journal of Physics: Conference Series 78 (2007) 012022.
Abstract / PDF / Permanent JPCS Link
All 100 open access volumes of the Journal of Physics Conference Series (JPCS)are available via the journal home page: http://herald.iop.org/JPCS_home/m294/crk//link/1520
Understanding Disk Failure Rates: What does an MTTF of 1,000,000 hours mean to you? Bianca Schroeder, Garth A. Gibson. ACM Transactions on Storage (TOS), Volume 3 Issue 3, October 2007.
A Replicated File System for Grid Computing. Jiaying Zhang and Peter Honeyman. Concurrency and Computation: Practice and Experience, 2007; 00:1–7.
Abstract / PDF
Data Management: the Victorian Era Child of the 21st Century. Farber R., PNNL-SA-53343, Pacific Northwest National Laboratory, Richland, WA. Published in Scientific Computing, vol. 24 no.4, March 2007.
HTML
Balancing Computation and Experiment. Farber R. PNNL-SA-54125, Pacific Northwest National Laboratory, Richland, WA. Published in Innovation: America's Journal of Technology Commercialization, vol. 5 no. 24, April/May 2007.
HTML
Early Experiences on the Journey Towards Self-* Storage. Michael Abd-El-Malek, William V. Courtright II, Chuck Cranor, Gregory R. Ganger, James Hendricks, Andrew J. Klosterman, Michael Mesnier, Manish Prasad, Brandon Salmon, Raja R. Sambasivan, Shafeeq Sinnamohideen, John D. Strunk, Eno Thereska, Matthew Wachs, Jay J. Wylie. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, September 2006.
Abstract / PDF
Conferences
Mixing Hadoop and HPC Workloads on Parallel Filesystems
Esteban Molina-Estolano, Carlos Maltzahn, Scott Brandt, University of California Santa Cruz,
Maya Gokhale, John May, Lawrence Livermore Nat Lab,
John Bent, Los Alamos Nat Lab
PDF
DiskReduce: RAID for Data-Intensive Scalable Computing
Bin Fan, Wittawat Tantisiriroj, Lin Xiao, Garth Gibson. 4th Petascale Data Storage Workshop held in conjunction with Supercomputing '09, November 15, 2009. Portland, Oregon. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-112, November 2009.
Abstract / PDF [304K]
...And eat it too: High read performance in write-optimized HPC I/O middleware file formats
Milo Polte, Jay Lofstead, John Bent, Garth Gibson, Scott A. Klasky, Qing Liu, Manish Parashar, Norbert Podhorszki, Karsten Schwan, Meghan Wingate, Matthew Wolf. 4th Petascale Data Storage Workshop held in conjunction with Supercomputing '09, November 15, 2009. Portland, Oregon. Supersedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-09-111, November 2009.
Abstract / PDF [388K]
Fusing Data Management Services with File Systems
Scott Brandt, Carlos Maltzahn, Neoklis Polyzotis, Wang-Chiew Tan, University of California, Santa Cruz
PDF
In Search of an API for Scalable File Systems: Under the table or above it? Swapnil Patil, Garth A. Gibson, Gregory R. Ganger, Julio Lopez, Milo Polte, Wittawat Tantisiroj, and Lin Xiao. USENIX HotCloud Workshop 2009. June 2009, San Diego CA.
PDF [260K]
Fast Log-based Concurrent Writing of Checkpoints. Milo Polte, Jiri Simsa, Wittawat Tantisiriroj, Garth Gibson, Shobhit Dayal, Mikhail Chainani, Dilip Kumar Uppugandla. Proceedings of the 3rd Petascale Data Storage Workshop held in conjunction with Supercomputing '08, November 17, 2008, Austin, TX.
Abstract / PDF [262K]
Comparing Performance of Solid State Devices and Mechanical Disks. Milo Polte, Jiri Simsa, Garth Gibson. Proceedings of the 3rd Petascale Data Storage Workshop held in conjunction with Supercomputing '08, November 17, 2008, Austin, TX.
Abstract / PDF [99K]
Scalable Full-Text Search for Petascale File Systems. Leung, Andrew W. and Ethan L. Miller, University of California, Santa Cruz . 3rd Petascale Data Storage Workshop held in conjunction with SC08, Austin, TX. Nov. 17, 2008.
PDF
Introducing Map-Reduce to High End Computing. Mackey, Grant, Saba Sehrish, John Bent, Jun Wang, University of Central Florida and Los Alamos National Laboratory. 3rd Petascale Data Storage Workshop held in conjunction with SC08, Austin, TX. Nov. 17, 2008.
PDF
Logan: Automatic Management for Evolvable, Large-Scale, Archival Storage. Storer, Mark W., Kevin M. Greenan, Ian F. Adams, Ethan L. Miller, Darrell D. E. Long, Kaladhar Voruga, University of California, Santa Cruz. 3rd Petascale Data Storage Workshop held in conjunction with SC08, Austin, TX. Nov. 17, 2008.
PDF
Arbitrary Dimension Reed-Solomon Coding and Decoding for Extended RAID on GPUs. Curry, Matthew L., H. Lee Ward, Anthony Skjellum, and Ron Brightwell, University of Alabama at Birmingham and Sandia National Laboratory. 3rd Petascale Data Storage Workshop held in conjunction with SC08, Austin, TX. Nov. 17, 2008.
PDF
An Analysis of Data Corruption in the Storage Stack. Bairavasundaram, L., G. Goodson, B. Schroeder, A. Arpaci-Dusseau, R. Arpaci-Dusseau. 6th Usenix Conference on File and Storage Technologies (FAST 2008).
PDF
Holistic Evaluation of Lightweight Operating Systems using the PERCU Method. Kramer, William T.C. Yun (Helen) He, Jonathan Carter, Josephy Glenski, Lynn Rippe, Nicholas Cardo. LBNL Technical Report Number TBD, July 2008.
PDF
Reliability of XOR-based erasure codes on heterogeneous devices. Greenan, Kevin, Ethan L. Miller, Jay Wylie. Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2008), June 2008.
PDF
Measurement and Analysis of Large-Scale Network File System Workloads. Leung, Andrew, Shankar Pasupathy, Garth Goodson, Ethan L. Miller. Proceedings of the 2008 USENIX Technical Conference, June 2008.
PDF
Franklin: User Experiences. He, Yun (Helen), William T.C. Kramer, Jonathan Carter, Nicholas Cardo. Proceedings of the Cray User Group 2008, Helsinki, Finland, May 5-8, 2008.
Performance and Availability Tradeoffs in Replicated File Systems. Zhang, Jiaying and Peter Honeyman. Proc. International Workshop on Resiliency in High Performance Computing (RESILIENCE 2008), in conjunction with the 8th IEEE International Symposium on Cluster Computing and Grid (CCGRID 2008), Lyon (May 2008).
PDF (TR Version - CITI Technical Report 07-3)
NERSC 2016—Extreme Computation and Data for Science. Kramer, William. Proceedings of the Cray User Group 2008, Helsinki, Finland, May 5-8, 2008
Personalized Interactive Faceted Search. Koren, Jonathan, Yi Zhang, Xue Liu. Proceedings of the 17th International Conference on the World Wide Web (WWW 2008), April 2008.
PDF
Using Utility to Provision Storage Systems. John D. Strunk, Eno Thereska, Christos Faloutsos, Gregory R. Ganger. 6th USENIX Conference on File and Storage Technologies (FAST '08). Feb. 26-29, 2008. San Jose, CA. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-07-106, September 2007.
Abstract / PDF [310K]
Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems. Amar Phanishayee, Elie Krevat, Vijay Vasudevan, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, Srinivasan Seshan. 6th USENIX Conference on File and Storage Technologies (FAST '08). Feb. 26-29, 2008. San Jose, CA. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-07-105, September 2007.
Abstract / PDF [374K]
Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage. Storer, Mark W., Kevin Greenan, Ethan L. Miller, Kaladhar Voruganti. Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '08), February 2008, pages 1-16.
PDF
Accelerating Reed-Solomon Coding in RAID Systems with GPUs.
Matthew Curry (University of Alabama at Birmingham, USA); Lee Ward (Sandia National Laboratories, USA); Tony Skjellum (University of Alabama Birmingham, USA); Ron Brightwell (Sandia National Laboratories, USA). 22nd IEEE International Parallel and Distributed Processing Symposium,
April 14-18, 2008,
Miami, FL.
Abstract / PDF
An Analysis of Data Corruption in the Storage Stack. L. Bairavasundaram, G. Goodson, B. Schroeder, A. Arpaci-Dusseau, R. Arpaci-Dusseau, 6th Usenix Conference on File and Storage Technologies (FAST 2008).
Abstract / PDF
Modeling the Impact of Checkpoints on Next-Generation Systems. Oldfield, R.A., S. Arunagiri, P.J. Teller, S. Seelam, M.R. Varela, R. Riesen, P.C. Roth. IEEE Conference on Mass Storage Systems and Technologies, San Diego, California, Nov. 2007.
On Application-level Approaches to Avoiding TCP Throughput Collapse in Cluster-Based Storage Systems. E. Krevat, V. Vasudevan, A. Phanishayee, D. Andersen, G. Ganger, G. Gibson, S. Seshan. Proceedings of the 2nd international Petascale Data Storage Workshop (PDSW '07)
held in conjunction with Supercomputing '07. November 11, 2007, Reno, NV.
Abstract / PDF
GIGA+: Scalable Directories for Shared File Systems. Swapnil V. Patil, Garth A. Gibson, Sam Lang, Milo Polte. Proceedings of the 2nd international Petascale Data Storage Workshop (PDSW '07)
held in conjunction with Supercomputing '07. November 11, 2007, Reno, NV.
Abstract / PDF
RADOS: A Fast, Scalable, and Reliable Storage Service for Petabyte-scale Storage Clusters. Weil, Sage, Andrew Leung, Scott A. Brandt, Carlos Maltzahn. Proceedings of the ACM Petascale Data Storage Workshop 2007 (PDSW 07), November 2007.
PDF
Scalable Security for Petascale Parallel File Systems. Andrew Leung, Ethan L. Miller, and Stephanie Jones., SC '07, Reno, NV, November 2007.
PDF
Searching and Navigating Petabyte Scale File Systems Based on Facets. Koren, Jonathan, Yi Zhang, Sasha Ames, Andrew Leung, Carlos Maltzahn, Ethan L. Miller. Proceedings of the 2007 ACM Petascale Data Storage Workshop (PDSW 07), November 2007.
PDF
Characterizing the I/O Behavior of Scientific Applications on the Cray XT. Roth, P.C. 2007 Petascale Data Storage Workshop, co-located with SC07, Reno, Nevada, Nov. 2007.
PDF
Disaster Recovery Codes: Increasing Reliability with Large-Stripe Error Correction Codes. Greenan, Kevin, Ethan L. Miller, Thomas Schwarz, Darrell D. E. Long. Proceedings of the 3rd International Workshop on Storage Security and Survivability (StorageSS 2007), held in conjunction with the 14th ACM Conference on Computer and Communications Security (CCS 2007), October 2007.
PDF
POTSHARDS: Secure Long-Term Storage Without Encryption. Mark W. Storer, Kevin Greenan, Ethan L. Miller, Kaladhar Voruganti. Proceedings of the 2007 USENIX Technical Conference, June 2007.
Abstract / PDF
Categorizing and Differencing System Behaviours. Raja R. Sambasivan, Alice X. Zheng, Eno Thereska, Gregory R. Ganger. Second Workshop on Hot Topics in Autonomic Computing. June 15, 2007. Jacksonville, FL.
Abstract / PDF [120K]
PRIMS : Making NVRAM Suitable for Extremely Reliable Storage. Kevin Greenan, Ethan L. Miller. Proceedings of the 3rd Workshop on Hot Topics in System Dependability (HotDep '07), June 2007.
Abstract / PDF
Direct-pNFS: Scalable, Transparent, and Versatile Access to Parallel File Systems. Dean Hildebrand, Peter Honeyman. Proc. 16th IEEE International Symp. on High Performance Distributed Computing (HPDC 2007), Monterey. June 2007.
Abstract / PDF
Modeling the Relative Fitness of Storage. Michael P. Mesnier, Matthew Wachs, Raja R. Sambasivan, Alice X. Zheng, Gregory R. Ganger. SIGMETRICS'07, June 12-16, 2007, San Diego, California, USA.ACM. Awarded Best Paper.
Abstract / PDF
Hierarchical Replication Control in a Global File System. Jiaying Zhang and Peter Honeyman. Proc. 7th IEEE International Symp. on Cluster Computing and the Grid (CCGrid07), Rio de Janeiro. May 2007.
Abstract / PDF
pNFS and Linux: Working towards a Heterogeneous Future. Dean Hildebrand, Peter Honeyman, and W.A. (Andy) Adamson. Proc. 8th LCI International Conf. on High-Performance Clustered Computing, South Lake Tahoe. May 2007.
Abstract / PDF
Fingerpointing Correlated Failures in Replicated Systems. Soila Pertet, Rajeev Gandhi and Priya Narasimhan. USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML), Cambridge, MA. April 2007.
Abstract / PDF
MultiMap: Preserving Disk Locality for Multidimensional Datasets. Minglong Shao, Steven W. Schlosser, Stratos Papadomanolakis, Jiri Schindler, Anastassia Ailamaki, Gregory R. Ganger. IEEE 23rd International Conference on Data Engineering (ICDE 2007) Istanbul, Turkey, April 2007. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-05-102. March 2005.
Abstract / PDF
The Computer Failure Data Repository. Bianca Schroeder, Garth Gibson. Invited contribution to the Workshop on Reliability Analysis of System Failure Data (RAF'07) MSR Cambridge, UK, March 2007.
Abstract / PDF
//TRACE: Parallel Trace Replay with Approximate Causal Events. Michael Mesnier, Matthew Wachs, Raja R. Sambasivan, Julio Lopez, James Hendricks, Gregory R. Ganger. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13-16, 2007, San Jose, CA. Supercedes Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-06-108, September 2006.
Abstract / PDF
Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You? Bianca Schroeder, Garth A. Gibson. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13--16, 2007, San Jose, CA. Best Paper Award.
Abstract / PDF
A Large Scale Study of Failures in High-performance-computing Systems. Bianca Schroeder, Garth Gibson. International Symposium on Dependable Systems and Networks (DSN 2006). IEEE Transactions on Dependable and Secure Computing (TDSC).
Abstract / PDF
Argon: Performance Insulation for Shared Storage Servers. Matthew Wachs, Michael Abd-El-Malek, Eno Thereska, Gregory R. Ganger. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07), February 13--16, 2007, San Jose, CA.
Abstract / PDF
Towards Fingerpointing in the Emulab Dynamic Distributed System. Michael P. Kasick, Priya Narasimhan, Kevin Atkinson, Jay Lepreau. Proceedings of the 3rd USENIX Workshop on Real, Large Distributed Systems (WORLDS '06), Seattle, WA. Nov. 5, 2006.
Abstract / PDF
NFSv4 Replication for Grid Storage Middleware. Jiaying Zhang and Peter Honeyman. Proc. 4th International Workshop on Middleware for Grid Computing, Melbourne. November 2006.
Abstract / PDF
Ceph: A Scalable, High-Performance Distributed File System. Sage Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, Carlos Maltzahn. Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06), November 2006.
Abstract / PDF
CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data. Sage Weil, Scott A. Brandt, Ethan L. Miller, Carlos Maltzahn. Proceedings of SC '06, November 2006.
Abstract / PDF
Reliability Mechanisms for File Systems Using Non-Volatile Memory as a Metadata Store. Kevin Greenan, Ethan L. Miller, Proceedings of the 6th ACM & IEEE Conference on Embedded Software (EMSOFT '06), October 2006, pages 178-187.
Abstract / PDF
Scalable Security for Large, High Performance Storage Systems. Andrew Leung, Ethan L. Miller. Proceedings of the 2nd ACM Workshop on Storage Security and Survivability (StorageSS 2006), October 2006.
Abstract / PDF
Technical Reports & Dissertations
PLFS: A Checkpoint Filesystem for Parallel Applications.
John Bent, Garth Gibson,
Gary Grider, Ben McClelland,
Paul Nowoczynski,
James Nunez, Milo Polte,
Meghan Wingate. LANL Technical Release LA-UR 09-02117, April 2009.
Abstract / PDF [415K]
Data-intensive file systems for Internet services:
A rose by any other name ... Wittawat Tantisiriroj, Swapnil Patil, Garth Gibson. Carnegie Mellon University Parallel Data Lab Technical Report
CMU-PDL-08-114.
October 2008
Abstract / PDF [350K]
Co-scheduling of Disk Head Time in Cluster-based Storage.
Matthew Wachs, Gregory R. Ganger. Carnegie Mellon University Parallel Data Lab Technical Report
CMU-PDL-08-113.
October 2008.
Abstract / PDF [1M]
GIGA+ : Scalable Directories for Shared File Systems.
Swapnil Patil, Garth Gibson. Carnegie Mellon University Parallel Data Lab Technical Report
CMU-PDL-08-110.
October 2008.
Abstract / PDF [400K]
Characterizing HEC Storage Systems at Rest. Shobhit Dayal. Carnegie
Mellon University Parallel Data Lab Technical Report CMU-PDL-08-109, July
2008.
Abstract / PDF [603K]
User Level Implementation of Scalable Directories (GIGA+). Hase, Sanket, Aditya Jayaraman, Vinay K. Perneti, Sundararaman Sridharan, Swapnil V. Patil, Milo Polte, Garth A. Gibson. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-08-107, May 2008.
Abstract / PDF [1.67M]
File System Virtual Appliances: Third-party File System Implementations without the Pain. Michael Abd-El-Malek, Matthew Wachs, James Cipar, Gregory R. Ganger, Garth A. Gibson, Michael K. Reiter. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-08-106, May 2008.
Abstract / PDF [508K]
MPP2 Syslog Data (2006-2008). Brown, DML, and GR Smith. 2008. PNNL-SA-61371 Pacific Northwest National Laboratory, Richland, WA.
Final MPP2 Failure Data. Brown, DML. 2008. PNNL-17833, Pacific Northwest National Laboratory, Richland, WA.
Back to the Future: The Return of Massively Parallel Systems. Farber, RM. 2008. PNNL-SA-59874, Pacific Northwest National Laboratory, Richland, WA.
HTML
Storage in Transition. Farber, RM. 2008. PNNL-SA-59313, Pacific Northwest National Laboratory, Richland, WA.
HTML
I/O Tracing on Catamount. Klundt, Ruth , Marlow Weston, Lee Ward. SAND2008-3684.
PDF
Reliability Results of NERSC Systems. Mokhtarani, Akbar, Jason Hick, William T.C. Kramer. LBN Report LBNL-430E
PDF
On Modeling the Relative Fitness of Storage. Michael P. Mesnier. Carnegie Mellon University, Dept. ECE Ph.D Dissertation CMU-PDL-07-108, December 19, 2007.
Abstract / PDF [1.16M]
Statistical breakdown of MSCF production file systems in Oct 2007. Felix, EJ. 2007. PNNL-17013, Pacific Northwest National Laboratory, Richland, WA.
Network Transparency in Wide Area Collaborations. Jiaying Zhang. Ph.D. Dissertation, University of Michigan, Ann Arbor, May 2007.
Abstract / PDF
Distributed Access to Parallel File Systems. Dean Hildebrand. Ph.D. Dissertation, University of Michigan, Ann Arbor, February 2007.
Abstract / PDF
Posters
Petascale Data Management: Guided by Measurement. Garth Gibson, PDSI PIs. June 2008, Washington. D.C.
PDF
PDSI Shared Information Resources for HEC Storage. PDSI PIs. ASCR PI meeting, March 31, 2008, Denver, CO.
PDF
PDSI Data Releases and Repositories. PDSI PIs. 6th USENIX Conference on File and Storage Technologies (FAST '08). Feb. 26-29, 2008. San Jose, CA.
PDF
Talks
Directions for Shingled-Write and TDMR System Architectures:
Synergies with Solid-State Disks.
Garth Gibson, CMU. IEEE INternational Magnetics Conference 2009 (Intermag'09), Sacramento, CA, May 4-8, 2009.
PDF
Storage for Petascale Computing. John Bent, Los Alamos National Lab, Wednesday March 25, 2009
PDF
High End Computing File System and I/O R&D Gaps Roadmap.
James Nunez, Los Alamos National Lab.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
Sandia I/O Traces. Lee Ward, SNL.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
Update on LANL Data and Information Availability. John Bent, LANL.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
PNNL Petascale Data Storage Institute: fsstats - Data release update. Evan Felix, PNNL.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
GIGA+: Scalable Directories for Shared File Systems. Garth Gibson, Carnegie Mellon University.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
Highly Scalable Metadata Search and Indexing. Ethan Miller, UC Santa Cruz.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
End-to-End Performance Management for Large, Distributed Storage. Scott Brandt, UC Santa Cruz.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
Performance insulation and predictability for shared cluster storage. Greg Ganger, Carnegie Mellon University.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
Towards Automated Problem Analysis of Large-Scale Storage Systems.
Priya Narasimhan, Carnegie Mellon University.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
SciDAC PDSI Update.
Garth Gibson, Carnegie Mellon University.
HEC FSIO R&D Conference/HECURA FSIO PI Meeting '08, Arlington, VA. Aug 3 - Aug 6, 2008.
PDF
Failure in Supercomputers and Supercomputer Storage. Garth Gibson, Carnegie Mellon University. NSF/DOE Expedition Workshop/Toward Scalable Data Management. June 10, 2008. Washington, D.C.
PDF [3.1M] / MP3 [2.7M]
Abstract: The largest computer systems have entered the era of Peta operations per second and will climb to Exa operations per second over the next decade, largely on the strength of more cores per chip and more chips per system. The inevitable consequence of increasing component counts is more parts that can fail, higher failure rates, more concurrent failures and more effort devoted to coping with and recovering from failures -- a key role for storage systems. In this talk I will review historical data on failure rates in supercomputers to project future failure rates, review growing limitations on traditional fault tolerance strategies for supercomputers based on high-speed checkpointing to parallel storage systems, and address the increasing failure issues in storage components.
Petascale Data Storage Institute - Access Methods. Garth Gibson, Carnegie Mellon University. SDM-PDSI Mini Workshop. Nov 30, 2007. Seattle, WA
PDF [495K]
Performance Challenges for Extreme Scale Computing. John T. Daly,
Los Alamos National Lab. SDI Seminar, Carnegie Mellon University.
PDF [4.6M]
Understanding Failure in Petascale Computers. Garth Gibson (Joint work with Bianca Schroeder).
2007 SciDAC Conference, June 25, Boston MA.
PDF [899K]
Design and Expressions of a Scalable Supercomputer. Lee Ward. Sandia’s MPP, 10/31/2006.
PDF [673K]








