I/O Performance with H5CPP: 687 MB/s for Small Matrices
🧪 Problem
High-performance computing projects often involve thousands of small matrices—each one carrying results, measurements, or partial data.
If you try to naively write each matrix as an individual dataset with the HDF5 C API, things quickly slow down.
We ran an experiment to compare: - Time to generate 4000 small matrices - Time to write them to disk - Dataset creation overhead
All using H5CPP + Armadillo.
⚙️ Setup
We generated 4000 random arma::mat
matrices and wrote each one into a dedicated HDF5 dataset using H5CPP’s type-safe API:
arma::mat X = arma::randu<arma::mat>(20, 20); // ~3.2 KB each
for (size_t i = 0; i < 4000; i++) {
h5::ds_t ds = h5::create<arma::mat>(fd, "/dataset/" + std::to_string(i),
h5::current_dims{20,20}, h5::chunk{20,20} | h5::gzip{6});
h5::write(ds, X);
}
HDF5 File: container.h5
Hardware: SSD rated at 500 MB/s
Library: HDF5 1.10.x + H5CPP + Armadillo
📈 Results
CREATING 4000 h5::ds_t cost: 0.215 s (≈ 18,555 dataset/sec)
GENERATING 4000 matrices cost: 30.93 s (≈ 129 matrix/sec)
WRITING 4000 matrices cost: 1.466 s (≈ 2728 matrix/sec)
THROUGHPUT: 687.61 MB/s
Total I/O payload: ~1.0 GB Effective bandwidth: ~687 MB/s
That’s over 90% of theoretical SSD write speed, with zero manual buffer management.
🔍 What Makes This Fast?
- Chunked layout: We write with
h5::chunk{20,20}
, which matches the matrix shape - Compression: With
h5::gzip{6}
we get file size savings without hurting speed - No reopening datasets: Handles persist across writes
- RAII-powered batching: We defer closing until scope exits
🧪 Compared to C API?
The HDF5 C API gives you:
H5Dcreate2(...)
H5Dwrite(...)
H5Dclose(...)
...all for each matrix. That introduces latency, boilerplate, and opens the door to error-prone lifecycle handling.
✨ Takeaways
Task | Time | Rate |
---|---|---|
Dataset creation | 0.215s | 18,555 datasets/s |
Matrix generation | 30.9s | 129 matrices/s |
Matrix write (I/O) | 1.46s | 687 MB/s |
- 🧠 Most of the time was spent generating matrices, not writing them.
- 🚀 H5CPP I/O speed rivaled raw
fwrite()
— but with structured layout and compression.
📌 Conclusion
If you’re writing small matrices to HDF5 at scale, H5CPP gives you raw performance + clean syntax.
No memory leaks, no handle juggling, no wasted cycles.