Skip to content

December 2019

Zero-Cost C++ Structs to HDF5 Compound Types with H5CPP

🧬 The Setup: From HPC POD Structs to HDF5 COMPOUNDs

You're handed an HDF5 file with compound data like this:

DATASET "g_data" {
  DATATYPE  H5T_COMPOUND {
    H5T_IEEE_F64LE "temp";
    H5T_IEEE_F64LE "density";
    H5T_ARRAY { [3] H5T_IEEE_F64LE } "B";
    H5T_ARRAY { [3] H5T_IEEE_F64LE } "V";
    H5T_ARRAY { [20] H5T_IEEE_F64LE } "dm";
    H5T_ARRAY { [9] H5T_IEEE_F64LE } "jkq";
  }
  DATASPACE SIMPLE { ( 70, 3, 3 ) / ( 70, 3, 3 ) }
}
With H5CPP, you don't need to touch the C API. Just describe this with a plain-old C++ struct:

namespace sn {
    struct record_t {
        double temp;
        double density;
        double B[3];
        double V[3];
        double dm[20];
        double jkq[9];
    };
}

That’s it.

🧰 Write Code as If You Had a Magic HDF5 Interface

No manual H5Tinsert, no dataspace juggling:

#include <vector>
#include "struct.h"
#include <h5cpp/core>
#include "generated.h"  // <-- auto-generated with h5cpp compiler
#include <h5cpp/io>

int main(){
    h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);
    h5::create<sn::record_t>(fd, "/Module/g_data", h5::max_dims{70, 3, 3});

    auto dataset = h5::read<std::vector<sn::record_t>>(fd, "/Module/g_data");
    for (const auto& rec : dataset)
        std::cerr << rec.temp << " ";
    std::cerr << std::endl;
}

Compile and run:

h5cpp struct.cpp -- -std=c++17 -I/usr/include -Dgenerated.h
g++ struct.o -lhdf5 -lz -ldl -lm -o struct
./struct

🧠 Under the Hood: Codegen Output

H5CPP automatically generates a minimal type descriptor that registers your struct with the HDF5 type system:

template<> hid_t inline register_struct<sn::record_t>(){
    // array declarations omitted for brevity
    hid_t ct = H5Tcreate(H5T_COMPOUND, sizeof(sn::record_t));
    H5Tinsert(ct, "temp", HOFFSET(sn::record_t, temp), H5T_NATIVE_DOUBLE);
    ...
    return ct;
}

Everything is cleaned up (H5Tclose(...)) to avoid resource leaks.

📚 Can I Extract One Field From a Compound?

"Can I create a virtual dataset from just the 'density' field in g_data?"

Short answer: yes, but not easily.

HDF5 virtual datasets (VDS) work best with entire datasets, not fields of compound types. However, a workaround is to:

  • Use hyperslab selections to read only the density field in software.
  • Or, repack the data using H5CPP into a derived dataset containing only the fields you want.

Future H5CPP support might wrap this more cleanly, but currently, you'll want to do the extraction in-memory.


✅ Summary

  • Describe your data in C++ once.
  • H5CPP generates the boilerplate and builds your type descriptors.
  • You can now read, write, and manipulate rich compound datasets without touching the C API.
  • Clean syntax, high performance, zero leaks.

Ready to ditch boilerplate and focus on real work?

Extracting Insight from HDF5 Compound Datasets Using H5CPP

"What if you could treat HDF5 compound datasets like simple C++ structs and just… write your simulation code? Forget the messy HDF5 C API. Write clean, modern C++. Let the compiler do the rest."

🚀 The Problem

One of our HPC applications writes simulation results to HDF5 files using compound types—essentially structured records with fields like temperature, density, and multidimensional vectors such as velocity and magnetic fields. A simplified dump of the file looks like this:

HDF5 "test.h5" {
  GROUP "/" {
    ATTRIBUTE "GRID_DIMENSIONS" { H5T_STD_I32LE (3) }
    ATTRIBUTE "X_AXIS"          { H5T_IEEE_F64LE (3) }
    ATTRIBUTE "Z_AXIS"          { H5T_IEEE_F64LE (70) }

    GROUP "Module" {
      DATASET "g_data" {
        DATATYPE H5T_COMPOUND {
          H5T_IEEE_F64LE "temp";
          H5T_IEEE_F64LE "density";
          H5T_ARRAY { [3] H5T_IEEE_F64LE } "B";
          H5T_ARRAY { [3] H5T_IEEE_F64LE } "V";
          H5T_ARRAY { [20] H5T_IEEE_F64LE } "dm";
          H5T_ARRAY { [9] H5T_IEEE_F64LE } "jkq";
        }
        DATASPACE SIMPLE { (70, 3, 3) }
      }
    }
  }
}

Now imagine wanting to selectively extract the "density" field into its own dataset—for visualization, analysis, or postprocessing. Can we filter compound dataset fields like this? Better yet, can we work with this structure using modern C++?

Let’s take a practical dive.

🔧 Step 1: Define the POD Struct

Start by expressing your compound layout as a C++ POD type (we recommend using nested namespaces for organization):

#ifndef  H5TEST_STRUCT_01 
#define  H5TEST_STRUCT_01

namespace sn {
    struct record_t {
        double temp;
        double density;
        double B[3];
        double V[3];
        double dm[20];
        double jkq[9];
    };
}
#endif

You’ll use this structure as the basis for both reading and writing your data.

✨ Step 2: Write the Code Like You Mean It

With H5CPP, you don’t need to write any boilerplate for type descriptors or deal with the HDF5 C API. Just:

#include <iostream>
#include <vector>
#include "struct.h"
#include <h5cpp/core>
  #include "generated.h" // generated by the h5cpp toolchain
#include <h5cpp/io>

int main() {
    h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);

    // Create the dataset
    h5::create<sn::record_t>(fd, "/Module/g_data", h5::max_dims{70,3,3});

    // Read entire dataset into memory
    auto dataset = h5::read<std::vector<sn::record_t>>(fd, "/Module/g_data");

    // Loop over records and print temperature
    for (const auto& rec : dataset)
        std::cerr << rec.temp << " ";
    std::cerr << std::endl;
}

No type registration? It’s handled for you.

🧠 Step 3: Generate the Type Descriptor

Use the H5CPP compiler to generate the type description (generated.h):

h5cpp struct.cpp -- -std=c++17 -I/usr/include

It emits something like:

template<> hid_t inline register_struct<sn::record_t>(){
    // define compound type layout with H5Tinsert() calls
    // ...
    H5Tinsert(ct_00, "density", HOFFSET(sn::record_t, density), H5T_NATIVE_DOUBLE);
    // ...
    return ct_00;
}

This magic glues your record_t to an HDF5 COMPOUND type with nested arrays.

🧪 Bonus: Can We Slice Fields From Compound Datasets?

Yes, with HDF5 1.10+ you can use virtual datasets (VDS) to reference subsets of existing datasets, but there are some caveats:

  • You cannot directly extract a field from a compound dataset as a virtual dataset without writing a filter or preprocessor.
  • H5CPP doesn't (yet) offer field projections, but you can write a thin wrapper to convert std::vector<record_t> into std::vector<double> by extracting the field manually.

A future enhancement could be writing VDS definitions that reference memory offsets within compound datasets, but for now, you can batch process like this:

std::vector<double> densities;
for (auto& rec : dataset)
    densities.push_back(rec.density);

Simple, fast, cache-friendly.

📈 Output and Verification

Once compiled and run, you’ll get:

./struct
h5dump -pH test.h5

Showing the full compound structure, chunk layout, and memory offsets exactly as designed.

🧩 Conclusion

Working with compound HDF5 types doesn’t have to mean writing hundreds of lines of C code. With H5CPP:

  • You declare a C++ struct that mirrors your HDF5 layout.
  • The compiler auto-generates the type description.
  • Your code stays clean, idiomatic, and testable.

And when you want to extract a field like "density"? You don’t need virtual datasets—you just write C++.

“If you can write a struct, you can store structured data.”