Extracting Insight from HDF5 Compound Datasets Using H5CPP
"What if you could treat HDF5 compound datasets like simple C++ structs and just… write your simulation code? Forget the messy HDF5 C API. Write clean, modern C++. Let the compiler do the rest."
🚀 The Problem
One of our HPC applications writes simulation results to HDF5 files using compound types—essentially structured records with fields like temperature, density, and multidimensional vectors such as velocity and magnetic fields. A simplified dump of the file looks like this:
HDF5 "test.h5" {
GROUP "/" {
ATTRIBUTE "GRID_DIMENSIONS" { H5T_STD_I32LE (3) }
ATTRIBUTE "X_AXIS" { H5T_IEEE_F64LE (3) }
ATTRIBUTE "Z_AXIS" { H5T_IEEE_F64LE (70) }
GROUP "Module" {
DATASET "g_data" {
DATATYPE H5T_COMPOUND {
H5T_IEEE_F64LE "temp";
H5T_IEEE_F64LE "density";
H5T_ARRAY { [3] H5T_IEEE_F64LE } "B";
H5T_ARRAY { [3] H5T_IEEE_F64LE } "V";
H5T_ARRAY { [20] H5T_IEEE_F64LE } "dm";
H5T_ARRAY { [9] H5T_IEEE_F64LE } "jkq";
}
DATASPACE SIMPLE { (70, 3, 3) }
}
}
}
}
Now imagine wanting to selectively extract the "density"
field into its own dataset—for visualization, analysis, or postprocessing. Can we filter compound dataset fields like this? Better yet, can we work with this structure using modern C++?
Let’s take a practical dive.
🔧 Step 1: Define the POD Struct
Start by expressing your compound layout as a C++ POD type (we recommend using nested namespaces for organization):
#ifndef H5TEST_STRUCT_01
#define H5TEST_STRUCT_01
namespace sn {
struct record_t {
double temp;
double density;
double B[3];
double V[3];
double dm[20];
double jkq[9];
};
}
#endif
You’ll use this structure as the basis for both reading and writing your data.
✨ Step 2: Write the Code Like You Mean It
With H5CPP, you don’t need to write any boilerplate for type descriptors or deal with the HDF5 C API. Just:
#include <iostream>
#include <vector>
#include "struct.h"
#include <h5cpp/core>
#include "generated.h" // generated by the h5cpp toolchain
#include <h5cpp/io>
int main() {
h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);
// Create the dataset
h5::create<sn::record_t>(fd, "/Module/g_data", h5::max_dims{70,3,3});
// Read entire dataset into memory
auto dataset = h5::read<std::vector<sn::record_t>>(fd, "/Module/g_data");
// Loop over records and print temperature
for (const auto& rec : dataset)
std::cerr << rec.temp << " ";
std::cerr << std::endl;
}
No type registration? It’s handled for you.
🧠 Step 3: Generate the Type Descriptor
Use the H5CPP compiler to generate the type description (generated.h
):
It emits something like:
template<> hid_t inline register_struct<sn::record_t>(){
// define compound type layout with H5Tinsert() calls
// ...
H5Tinsert(ct_00, "density", HOFFSET(sn::record_t, density), H5T_NATIVE_DOUBLE);
// ...
return ct_00;
}
This magic glues your record_t
to an HDF5 COMPOUND type with nested arrays.
🧪 Bonus: Can We Slice Fields From Compound Datasets?
Yes, with HDF5 1.10+ you can use virtual datasets (VDS) to reference subsets of existing datasets, but there are some caveats:
- You cannot directly extract a field from a compound dataset as a virtual dataset without writing a filter or preprocessor.
- H5CPP doesn't (yet) offer field projections, but you can write a thin wrapper to convert
std::vector<record_t>
intostd::vector<double>
by extracting the field manually.
A future enhancement could be writing VDS definitions that reference memory offsets within compound datasets, but for now, you can batch process like this:
Simple, fast, cache-friendly.
📈 Output and Verification
Once compiled and run, you’ll get:
Showing the full compound structure, chunk layout, and memory offsets exactly as designed.
🧩 Conclusion
Working with compound HDF5 types doesn’t have to mean writing hundreds of lines of C code. With H5CPP:
- You declare a C++ struct that mirrors your HDF5 layout.
- The compiler auto-generates the type description.
- Your code stays clean, idiomatic, and testable.
And when you want to extract a field like "density"
? You don’t need virtual datasets—you just write C++.
“If you can write a struct, you can store structured data.”