4th Petascale Data Storage Workshop
Supercomputing '09

Held in conjunction with SC09 and sponsored by the DOE SciDAC Petascale Data Storage Institute (PDSI)

Session Chair: Garth Gibson, CMU

Sunday, November 15, 2009
9:00 a.m. - 5:30 p.m.
Room A106, Oregon Convention Center, Portland, Oregon

SC09 Workshop Web Page

Abstract | Agenda & Links to Papers and Talks | ACM Digital Library TOC
Workshop Posters | Other Workshops of Interest

Petascale Data Storage Workshops


Petascale computing infrastructures make petascale demands on information storage capacity, performance, concurrency, reliability, availability, and manageability. This one-day workshop focuses on the data storage problems and emerging solutions found in petascale scientific computing environments, with special attention to issues in which community collaboration can be crucial, problem identification, workload capture, solution interoperability, standards with community buy-in, and shared tools. This workshop seeks contributions on relevant topics, including but not limited to: performance and benchmarking results and tools, failure tolerance problems and solutions, APIs for high performance features, parallel file systems, high bandwidth storage architectures, wide area file systems, metadata intensive workloads, autonomics for HPC storage, virtualization for storage systems, data-intensive and cloud storage,archival storage advances, resource management innovations, etc.

Submitted extended abstracts (up to 5 pages, due Sept. 18, 2009) will be peer reviewed for presentation and publication on www.pdsi-scidac.org and in the ACM digital library.


All papers presented at this workshop are also online at the ACM Digital Library (table of contents of the procedings).

8:55am - 9:00am
Welcome - Garth Gibson, Workshop Chair
9:00am - 10:00am
SESSION 1: Data-Intensive Cluster Storage
  Mixing Hadoop and HPC Workloads on Parallel Filesystems
Esteban Molina-Estolano, Carlos Maltzahn, Scott Brandt, University of California Santa Cruz, Maya Gokhale, John May, Lawrence Livermore Nat Lab, John Bent, Los Alamos Nat Lab
Paper | Slides

DiskReduce: RAID for Data-Intensive Scalable Computing
Bin Fan, Wittawat Tantisiriroj, Lin Xiao, Garth Gibson, Carnegie Mellon University
Paper | Slides
10:00am - 10:30am
POSTER SESSION 1 - List of participants and links to posters
10:30am - 12:30pm
SESSION 2: Patterns in Petascale Storage Access
  Data Layout Optimization for Petascale File Systems
Xian-He Sun, Yong Chen, Yanlong Yin, Illinois Institute of Technology
Paper | Slides

Case Studies in Storage Access by Loosely Coupled Petascale Applications
Justin M. Wozniak, Michael Wilde, Argonne Nat Lab
Paper | Slides

...And eat it too: High read performance in write-optimized HPC I/O middleware file formats
Milo Polte, Garth Gibson, Carnegie Mellon University, Jay Lofstead, Karsten Schwan, Matthew Wolf, Georgia Institute of Technology, John Bent, Meghan Wingate, Los Alamos Nat Lab, Scott A. Klasky, Qing Liu, Norbert Podhorszki, Oak Ridge Nat Lab, Manish Parashar, Rutgers University
Paper | Slides

Scalable I/O Tracing and Analysis
Karthik Vijayakumar, Frank Mueller, Xiaosong Ma, North Carolina State University, Philip C. Roth, Oak Ridge Nat Lab.
Paper | Slides

12:30pm - 2:00pm
2:00pm - 3:00pm
SESSION 3: Integrating Enterprise Storage Features
  pNFS, POSIX, and MPI-IO: A Tale of Three Semantics
Dean Hildebrand, Roger Haskin, IBM Almaden Research Center, Arifa Nisar, Northwestern University
Paper | Slides

Uncovering Errors: The Cost of Detecting Silent Data Corruption
Sumit Narayan, John A. Chandy, University of Connecticut, Samuel Lang, Philip Carns, Robert Ross, Argonne Nat Lab
Paper | Slides
3:00pm - 3:30pm
POSTER SESSION 2 - List of participants and links to posters
3:30pm - 4:30pm
SESSION 4: Integrating Databases
  Fusing Data Management Services with File Systems
Scott Brandt, Carlos Maltzahn, Neoklis Polyzotis, Wang-Chiew Tan, University of California, Santa Cruz
Paper | Slides

Using the Active Storage Fabrics Model to Address Petascale Storage Challenges

Blake G. Fitch, Aleksandr Rayshubskiy, Michael C. Pitman, Robert S. Germain, IBM T.J. Watson Research Center, T.J. Christopher Ward, IBM Software Group Hursley Park
Paper | Slides
4:30pm - 5:00pm
Short Announcements (sign up onsite) & Town Hall Meeting
5:00pm - 5:30pm
POSTER SESSION 3 - List of participants and links to posters

Garth A. Gibson, Carnegie Mellon University and Panasas Inc.
Darrell Long, University of California, Santa Cruz
Peter Honeyman, University of Michigan, Ann Arbor, Center for Information Technology Integration
Gary A. Grider, Los Alamos National Laboratory
John Shalf, National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory
Philip C. Roth, Oak Ridge National Laboratory
Evan J. Felix, Pacific Northwest National Laboratory
Lee Ward, Sandia National Laboratory
Rob Ross, Argonne National Laboratory
Karsten Schwan, Georgia Institute of Technology
William T. C. Kramer, National Center for Supercomputing Applications

Other Workshops & Panels of Interest at SC09

pNFS: Parallel Storage Client and Server Development Panel Update

Primary Session Leader:
Joshua Konkle (NetApp)

Birds-of-a-Feather Session Tuesday, 05:30PM - 07:00PM
Room E143-144

This panel will appeal to Virtual Data Center Managers, Database Server administrators, and those that are seeking a fundamental understanding pNFS. This panel will cover the four key reasons to start working with NFSv4 today. Explain the storage layouts for parallel NFS; NFSv4.1 Files, Blocks and T10 OSD Objects. We’ll engage the panel with a series of questions related to client technology, storage layouts, data management, virtualization, databases, geoscience, video production, finite element analysis, MPI-IO client/file locking integration, data intensive searching, etc . You’ll have an opportunity to ask detailed questions about technology and panel participant plans from SNIA’s NFS Special Interest Group which is part of the Ethernet Storage Forum.

Last updated 2010-04-15 | ©2011Carnegie Mellon University