Skip to content

Extendable Datasets in HDF5—Simplified with H5CPP

The OP’s Context

The user wanted to create an HDF5 dataset that can grow over time—think event streams or log records—and do so with a clean C++ interface. Writing their own solution was on the table, but they were seeking something performant and maintainable.

H5CPP to the Rescue (Steven Varga, Mar 2 2023)

“Probably you want to roll your own, which is a good thing—but in case you're looking for a performant solution:”

#include <h5cpp/core>
#include "generated.h"  // your record type mapping
#include <h5cpp/io>
int main(){
  auto fd = h5::create("test.h5", H5F_ACC_TRUNC);
  { // Create an extendable dataset with your POD struct
    auto ds = h5::create<sn::record_t>(
      fd, "/path/dataset", h5::max_dims{H5S_UNLIMITED});
    // Assign vector-of-strings attribute
    ds["attribute"] = {"first","second","...","last"};
    // Convert dataset to packet-table and append records
    auto pt = ds;
    for (int i = 0; i < 3; ++i) {
      h5::append(pt, sn::record_t{
        1.0 * i, 2.0 *i,
        {1,2,3,4,5}, {11,12,13,14,15}
      });
    }
  }
  { // Read back the dataset
    auto ds = h5::open(fd, "/path/dataset");
    auto attribute = h5::aread<std::vector<std::string>>(ds, "attribute");
    std::cout << attribute << std::endl;
    // Dump data
    for (auto rec : h5::read<std::vector<sn::record_t>>(ds, "/path/dataset"))
      std::cerr << rec.A << " ";
    std::cerr << std::endl;
  }
}

Why This Works So Nicely

Feature Benefit
H5S_UNLIMITED max dims Enables true extendible dataset
sn::record_t POD mapping Compact and expressive schema definitions
h5::append(...) API Simple, zero-boilerplate appends
Packet-table behind the scenes Efficient I/O under the hood
Vector attribute support Seamless metadata attachment and retrieval

TL;DR

Creating appendable, extendable datasets in C++ is no longer boilerplate-heavy. H5CPP gives you:

  • C++ templates for structure mapping
  • Clean append logic with h5::append()
  • Flexible storage with unlimited dataspace
  • Convenient metadata via attributes