Parallelization Patterns for HDF5 I/O in C++

The Question (Stefano Salvadè, Feb 19, 2018)

Goal: write analysis results in parallel to multiple HDF5 files—one per stream/process. The application is in C#, calling into a C/C++ API with HDF5 and MPI.

Current thought: typically one uses mpiexec -n x Program.exe, but spawning processes at runtime via MPI_Spawn() seems clunky.

Is there a more elegant way to spawn parallel I/O functions within the same program? Also, do I need one process per write action (whether to separate files or a single shared file)?

Steven Varga’s Take—Less PHDF5, More Pragmatism

Parallel HDF5 (PHDF5) shines in setups with parallel file systems and true distributed environments—think HPC clusters with coordinated I/O capabilities.
But in simpler contexts—e.g., a single machine or cloud instance—PHDF5 often imposes unused complexity and file-system limitations (filters unsupported, extra boilerplate, etc.).
Instead, Stefano could:
1. Use separate HDF5 files per process, even in RAM or temp storage
2. Aggregate later, e.g. via:
- copying into one file, or
- using HDF5’s external file driver to compose them into a single logical container
The aggregation step could run as a separate batch job after the main MPI job finishes.
If you do have real parallel I/O infrastructure, then yes—PHDF5 gives benefits. But often, simple is better.
— Steve

Summary Table

Scenario	Recommended Approach	Reasoning
N streams → N separate files (no shared file)	Serial HDF5 per process	Simplicity, no PHDF5 overhead, independent files
Need to combine results later	Aggregate post-run (external file driver or merge scripts)	Keeps initial write simple; flexible downstream processing
True parallel I/O on a parallel file system	Use PHDF5 with MPI	Efficient coordinated I/O, but more complexity and system requirements

When to Use What?

Use PHDF5 when:
You're in a high-performance cluster environment
The file system supports parallel write throughput
You benefit from collective operations and synchronized metadata handling
Stick with serial HDF5 per process when:
You're on a single system or cloud VM
You want to avoid complexity in your write path
You can afford a merge or collector step after the run

Wrap-Up Thoughts

Stefano’s “elegant parallel output within a single program” goal doesn’t necessarily require MPI-spawned processes or PHDF5. Often the simplest is best: spawn OS-level processes writing to their own files, then merge or link them later.
This keeps performance high, complexity low, and coordination overhead manageable.