December 2020

February 4, 2020
in hdf5, c++
7 min read

I/O Performance with H5CPP: 687 MB/s for Small Matrices

🧪 Problem

High-performance computing projects often involve thousands of small matrices—each one carrying results, measurements, or partial data.

If you try to naively write each matrix as an individual dataset with the HDF5 C API, things quickly slow down.

We ran an experiment to compare: - Time to generate 4000 small matrices - Time to write them to disk - Dataset creation overhead

All using H5CPP + Armadillo.

⚙️ Setup

We generated 4000 random arma::mat matrices and wrote each one into a dedicated HDF5 dataset using H5CPP’s type-safe API:

arma::mat X = arma::randu<arma::mat>(20, 20); // ~3.2 KB each
for (size_t i = 0; i < 4000; i++) {
    h5::ds_t ds = h5::create<arma::mat>(fd, "/dataset/" + std::to_string(i),
        h5::current_dims{20,20}, h5::chunk{20,20} | h5::gzip{6});
    h5::write(ds, X);
}

HDF5 File: container.h5 Hardware: SSD rated at 500 MB/s Library: HDF5 1.10.x + H5CPP + Armadillo

📈 Results

CREATING 4000 h5::ds_t     cost: 0.215 s  (≈ 18,555 dataset/sec)
GENERATING 4000 matrices   cost: 30.93 s  (≈ 129 matrix/sec)
WRITING    4000 matrices   cost: 1.466 s  (≈ 2728 matrix/sec)

THROUGHPUT: 687.61 MB/s

Total I/O payload: ~1.0 GB Effective bandwidth: ~687 MB/s

That’s over 90% of theoretical SSD write speed, with zero manual buffer management.

🔍 What Makes This Fast?

Chunked layout: We write with h5::chunk{20,20}, which matches the matrix shape
Compression: With h5::gzip{6} we get file size savings without hurting speed
No reopening datasets: Handles persist across writes
RAII-powered batching: We defer closing until scope exits

🧪 Compared to C API?

The HDF5 C API gives you:

H5Dcreate2(...)
H5Dwrite(...)
H5Dclose(...)

...all for each matrix. That introduces latency, boilerplate, and opens the door to error-prone lifecycle handling.

✨ Takeaways

Task	Time	Rate
Dataset creation	0.215s	18,555 datasets/s
Matrix generation	30.9s	129 matrices/s
Matrix write (I/O)	1.46s	687 MB/s

🧠 Most of the time was spent generating matrices, not writing them.
🚀 H5CPP I/O speed rivaled raw fwrite() — but with structured layout and compression.

📌 Conclusion

If you’re writing small matrices to HDF5 at scale, H5CPP gives you raw performance + clean syntax.

No memory leaks, no handle juggling, no wasted cycles.