Reading group, Data-Centric Parallel FIle Systems

Content Warning: Systems Paper

Best practices and lessons learned from deploying and operating large-scale data-centric parallel file systems.


This is review/state-of-the-practice paper from ORNL regarding their various experiences with huge parallel file systems.


  • Data sharing is required from the computing resources in an HPC system.
  • Data centric models of PFS
  • ORNL started research into DCPFS in 2005

A central challenge of the model is to retain data sharing flexibility while maintaining performance. This is apparently no small task. (Duh.)

  • Spider systems used for storage
  • Spider I : 240 GB/s IO, 10PB total storage
  • Spider II : 1 TB/s IO, 32PB total storage
  • both built on Lustre

Design Principles

  • Data islands:
    • No need to transfer data to another system after simulation
  • IO models:
    • High bandwidth
    • High checkpoint storage usage
    • High read / latency constrained


  • Talk to your users, customers, and people to understand the workloads
  • model or understand your I/O patterns



Nothing of personal interest here? RFPs are hard and you should let you PI make them? Computers are expensive? File sytems are expensive?

  • SSU - scalable system unit


Run them. Sequential, random, other I/O Patterns that fit your organization’s needs.


Short notes: It’s complicated.

Centralize infrastructure services between disparate systems, harden them, and retain centralized control and security.

  • Diskless nodes ares great! Use them.
  • Script away all the painful imaging
  • Use ramdisks for some node files


  • It’s a big deal
  • Store log data in database for easy of retrieval
  • granularity?


  • Develop with vendor and software roadmaps in mind
  • Segregate extreme loads from other stuff using namespaces and other system tools

Storage and IO tuning

  • Identify slow disks quickly and replace them.
  • Data locality optimizations
  • torus model used for router placement
  • congestion, logical and pysical layout, etc.
  • scaling tests
    • critical! they are expensive but worthwhile
    • identified optimal 1mb IO transfer sizes

Higher level services

“The real performance of a data centric PFS is what users observer. Expose infrastructure details to users in an easy-to-use programmable fashion to achieve higher performance for advanced users.”

Scalable linux tools for use on huge systems.

  • cp
  • find
  • tar

Changed to parallel versions via several company efforts.