"What if you could treat HDF5 compound datasets like simple C++ structs and just… write your simulation code? Forget the messy HDF5 C API. Write clean, modern C++. Let the compiler do the rest."
One of our HPC applications writes simulation results to HDF5 files using compound types—essentially structured records with fields like temperature, density, and multidimensional vectors such as velocity and magnetic fields. A simplified dump of the file looks like this:
HDF5 "test.h5" {
GROUP "/" {
ATTRIBUTE "GRID_DIMENSIONS" { H5T_STD_I32LE (3) }
ATTRIBUTE "X_AXIS" { H5T_IEEE_F64LE (3) }
ATTRIBUTE "Z_AXIS" { H5T_IEEE_F64LE (70) }
GROUP "Module" {
DATASET "g_data" {
DATATYPE H5T_COMPOUND {
H5T_IEEE_F64LE "temp";
H5T_IEEE_F64LE "density";
H5T_ARRAY { [3] H5T_IEEE_F64LE } "B";
H5T_ARRAY { [3] H5T_IEEE_F64LE } "V";
H5T_ARRAY { [20] H5T_IEEE_F64LE } "dm";
H5T_ARRAY { [9] H5T_IEEE_F64LE } "jkq";
}
DATASPACE SIMPLE { (70, 3, 3) }
}
}
}
}
Now imagine wanting to selectively extract the "density" field into its own dataset—for visualization, analysis, or postprocessing. Can we filter compound dataset fields like this? Better yet, can we work with this structure using modern C++?
Let’s take a practical dive.
Start by expressing your compound layout as a C++ POD type (we recommend using nested namespaces for organization):
#ifndef H5TEST_STRUCT_01
#define H5TEST_STRUCT_01
namespace sn {
struct record_t {
double temp;
double density;
double B[3];
double V[3];
double dm[20];
double jkq[9];
};
}
#endif
You’ll use this structure as the basis for both reading and writing your data.
With H5CPP, you don’t need to write any boilerplate for type descriptors or deal with the HDF5 C API. Just:
#include <iostream>
#include <vector>
#include "struct.h"
#include <h5cpp/core>
#include "generated.h" // generated by the h5cpp toolchain
#include <h5cpp/io>
int main() {
h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);
// Create the dataset
h5::create<sn::record_t>(fd, "/Module/g_data", h5::max_dims{70,3,3});
// Read entire dataset into memory
auto dataset = h5::read<std::vector<sn::record_t>>(fd, "/Module/g_data");
// Loop over records and print temperature
for (const auto& rec : dataset)
std::cerr << rec.temp << " ";
std::cerr << std::endl;
}
No type registration? It’s handled for you.
Use the H5CPP compiler to generate the type description (generated.h):
h5cpp struct.cpp -- -std=c++17 -I/usr/include
It emits something like:
template<> hid_t inline register_struct<sn::record_t>(){
// define compound type layout with H5Tinsert() calls
// ...
H5Tinsert(ct_00, "density", HOFFSET(sn::record_t, density), H5T_NATIVE_DOUBLE);
// ...
return ct_00;
}
This magic glues your record_t to an HDF5 COMPOUND type with nested arrays.
Yes, with HDF5 1.10+ you can use virtual datasets (VDS) to reference subsets of existing datasets, but there are some caveats:
- You cannot directly extract a field from a compound dataset as a virtual dataset without writing a filter or preprocessor.
- H5CPP doesn't (yet) offer field projections, but you can write a thin wrapper to convert
std::vector<record_t> into std::vector<double> by extracting the field manually.
A future enhancement could be writing VDS definitions that reference memory offsets within compound datasets, but for now, you can batch process like this:
std::vector<double> densities;
for (auto& rec : dataset)
densities.push_back(rec.density);
Simple, fast, cache-friendly.
Once compiled and run, you’ll get:
./struct
h5dump -pH test.h5
Showing the full compound structure, chunk layout, and memory offsets exactly as designed.
Working with compound HDF5 types doesn’t have to mean writing hundreds of lines of C code. With H5CPP:
- You declare a C++ struct that mirrors your HDF5 layout.
- The compiler auto-generates the type description.
- Your code stays clean, idiomatic, and testable.
And when you want to extract a field like "density"? You don’t need virtual datasets—you just write C++.
“If you can write a struct, you can store structured data.”