Extendable Datasets in HDF5—Simplified with H5CPP
The OP’s Context
The user wanted to create an HDF5 dataset that can grow over time—think event streams or log records—and do so with a clean C++ interface. Writing their own solution was on the table, but they were seeking something performant and maintainable.
H5CPP to the Rescue (Steven Varga, Mar 2 2023)
“Probably you want to roll your own, which is a good thing—but in case you're looking for a performant solution:”
#include <h5cpp/core>
#include "generated.h" // your record type mapping
#include <h5cpp/io>
int main(){
auto fd = h5::create("test.h5", H5F_ACC_TRUNC);
{ // Create an extendable dataset with your POD struct
auto ds = h5::create<sn::record_t>(
fd, "/path/dataset", h5::max_dims{H5S_UNLIMITED});
// Assign vector-of-strings attribute
ds["attribute"] = {"first","second","...","last"};
// Convert dataset to packet-table and append records
auto pt = ds;
for (int i = 0; i < 3; ++i) {
h5::append(pt, sn::record_t{
1.0 * i, 2.0 *i,
{1,2,3,4,5}, {11,12,13,14,15}
});
}
}
{ // Read back the dataset
auto ds = h5::open(fd, "/path/dataset");
auto attribute = h5::aread<std::vector<std::string>>(ds, "attribute");
std::cout << attribute << std::endl;
// Dump data
for (auto rec : h5::read<std::vector<sn::record_t>>(ds, "/path/dataset"))
std::cerr << rec.A << " ";
std::cerr << std::endl;
}
}
Why This Works So Nicely
Feature | Benefit |
---|---|
H5S_UNLIMITED max dims |
Enables true extendible dataset |
sn::record_t POD mapping |
Compact and expressive schema definitions |
h5::append(...) API |
Simple, zero-boilerplate appends |
Packet-table behind the scenes | Efficient I/O under the hood |
Vector attribute support | Seamless metadata attachment and retrieval |
TL;DR
Creating appendable, extendable datasets in C++ is no longer boilerplate-heavy. H5CPP gives you:
- C++ templates for structure mapping
- Clean append logic with
h5::append()
- Flexible storage with unlimited dataspace
- Convenient metadata via attributes