Zero-Cost C++ Structs to HDF5 Compound Types with H5CPP
𧬠The Setup: From HPC POD Structs to HDF5 COMPOUNDs
You're handed an HDF5 file with compound data like this:
DATASET "g_data" {
DATATYPE H5T_COMPOUND {
H5T_IEEE_F64LE "temp";
H5T_IEEE_F64LE "density";
H5T_ARRAY { [3] H5T_IEEE_F64LE } "B";
H5T_ARRAY { [3] H5T_IEEE_F64LE } "V";
H5T_ARRAY { [20] H5T_IEEE_F64LE } "dm";
H5T_ARRAY { [9] H5T_IEEE_F64LE } "jkq";
}
DATASPACE SIMPLE { ( 70, 3, 3 ) / ( 70, 3, 3 ) }
}
namespace sn {
struct record_t {
double temp;
double density;
double B[3];
double V[3];
double dm[20];
double jkq[9];
};
}
Thatβs it.
π§° Write Code as If You Had a Magic HDF5 Interface
No manual H5Tinsert, no dataspace juggling:
#include <vector>
#include "struct.h"
#include <h5cpp/core>
#include "generated.h" // <-- auto-generated with h5cpp compiler
#include <h5cpp/io>
int main(){
h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);
h5::create<sn::record_t>(fd, "/Module/g_data", h5::max_dims{70, 3, 3});
auto dataset = h5::read<std::vector<sn::record_t>>(fd, "/Module/g_data");
for (const auto& rec : dataset)
std::cerr << rec.temp << " ";
std::cerr << std::endl;
}
Compile and run:
h5cpp struct.cpp -- -std=c++17 -I/usr/include -Dgenerated.h
g++ struct.o -lhdf5 -lz -ldl -lm -o struct
./struct
π§ Under the Hood: Codegen Output
H5CPP automatically generates a minimal type descriptor that registers your struct with the HDF5 type system:
template<> hid_t inline register_struct<sn::record_t>(){
// array declarations omitted for brevity
hid_t ct = H5Tcreate(H5T_COMPOUND, sizeof(sn::record_t));
H5Tinsert(ct, "temp", HOFFSET(sn::record_t, temp), H5T_NATIVE_DOUBLE);
...
return ct;
}
Everything is cleaned up (H5Tclose(...)
) to avoid resource leaks.
π Can I Extract One Field From a Compound?
"Can I create a virtual dataset from just the 'density' field in
g_data
?"
Short answer: yes, but not easily.
HDF5 virtual datasets (VDS
) work best with entire datasets, not fields of compound types. However, a workaround is to:
- Use hyperslab selections to read only the
density
field in software. - Or, repack the data using H5CPP into a derived dataset containing only the fields you want.
Future H5CPP support might wrap this more cleanly, but currently, you'll want to do the extraction in-memory.
β Summary
- Describe your data in C++ once.
- H5CPP generates the boilerplate and builds your type descriptors.
- You can now read, write, and manipulate rich compound datasets without touching the C API.
- Clean syntax, high performance, zero leaks.
Ready to ditch boilerplate and focus on real work?