c++

May 6, 2020
in hdf5, c++
6 min read

Valgrind, Move Semantics, and HDF5: Memory-Safe Serialization with H5CPP

Introduction

Memory leaks? Invalid writes? Not in our house. When you're dealing with zero-copy serialization of C++ POD types into HDF5 packet tables, you'd better be sure the memory management holds up — especially under modern C++ semantics like move assignment.

In this short entry, we explore how h5cpp handles move semantics under stress, and how tools like Valgrind can validate correct cleanup when working with dynamically generated HDF5 compound types.

The Test: Moving Packet Tables

The test project from this commit uses:

a generated struct (packet_t) via generated.h and struct.h
a simple dataset append test using packettable.cpp
a Makefile that wires the H5CPP build chain to produce and inspect the binary

The core test is simple:

packet_t a, b;
b = std::move(a);  // move assignment

Followed by an append operation into an HDF5 Packet Table. We simulate real-world usage where streaming structs get moved during staging, queueing, or I/O flushing — and we want to guarantee correctness, not just behavior.

Memory Safety: Valgrind Results

Compiled with HDF5 debugging symbols and H5CPP, we ran:

valgrind --leak-check=full ./packettable

And here's what we expect to see:

==12345== All heap blocks were freed -- no leaks are possible
==12345== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Why this matters: move assignment can easily create subtle dangling references or uninitialized accesses in serialization-heavy code. By testing this path explicitly, we ensure that:

all memory is correctly initialized/cleaned up
no heap leaks happen during serialization/deserialization
HDF5’s internal resources are released cleanly

How H5CPP Helps

Behind the scenes, H5CPP generates type-safe HDF5 compound type definitions and adapters. This ensures:

correct alignment and layout matching
RAII-wrapped HDF5 resources
deterministic destruction of dataset writers/readers

The generated code in generated.h avoids heap allocation altogether unless explicitly requested, so move assignment is just a blit operation with well-defined cleanup.

Takeaways

✅ Always test your serialization stack under move semantics ✅ Use Valgrind or ASan regularly to catch lifetime bugs ✅ Let H5CPP do the heavy lifting of alignment, safety, and type deduction

March 25, 2020
in hdf5, c++
11 min read

Posted problem

Hello, I’m trying to cross-compile HDF5 using BinaryBuilder 3, but it’s still not clear to me the setup to use. I’ve tried calling CMake in the following way:

cmake .. -DCMAKE_INSTALL_PREFIX=\({prefix} \ -DCMAKE_TOOLCHAIN_FILE=\) \ -DHDF5_BUILD_CPP_LIB=OFF \ -DONLY_SHARED_LIBS=ON \ -DHDF5_BUILD_HL_LIB=ON \ -DHDF5_ENABLE_Z_LIB_SUPPORT=ON \ -DHDF5_ENABLE_SZIP_SUPPORT=OFF \ -DHDF5_ENABLE_SZIP_ENCODING=OFF \ -DBUILD_TESTING=OFF

To reproduce platform dependent variables for `cmake`

Allen Byrn

According to CMake, in the cross-compile toolchain you pre-set those variables to the results of the try run. Yes you need to know what those values are. I would start with the info from the CMake sites and go from there.

H5_LDOUBLE_TO_LONG_SPECIAL_RUN (advanced)
H5_LDOUBLE_TO_LONG_SPECIAL_RUN__TRYRUN_OUTPUT (advanced)
H5_LONG_TO_LDOUBLE_SPECIAL_RUN (advanced)
H5_LONG_TO_LDOUBLE_SPECIAL_RUN__TRYRUN_OUTPUT (advanced)
H5_LDOUBLE_TO_LLONG_ACCURATE_RUN (advanced)
H5_LDOUBLE_TO_LLONG_ACCURATE_RUN__TRYRUN_OUTPUT (advanced)
H5_LLONG_TO_LDOUBLE_CORRECT_RUN (advanced)
H5_LLONG_TO_LDOUBLE_CORRECT_RUN__TRYRUN_OUTPUT (advanced)
H5_DISABLE_SOME_LDOUBLE_CONV_RUN (advanced)
H5_DISABLE_SOME_LDOUBLE_CONV_RUN__TRYRUN_OUTPUT (advanced)
H5_NO_ALIGNMENT_RESTRICTIONS_RUN (advanced)
H5_NO_ALIGNMENT_RESTRICTIONS_RUN__TRYRUN_OUTPUT (advanced)

hdf5/config/cmake/ConversionTests.c

execute make conversion-dump to create the list of macro definitions

#ifdef H5_LDOUBLE_TO_LONG_SPECIAL_TEST    -> H5_LDOUBLE_TO_LONG_SPECIAL_RUN
#ifdef H5_LONG_TO_LDOUBLE_SPECIAL_TEST    -> H5_LONG_TO_LDOUBLE_SPECIAL_RUN
#ifdef H5_LDOUBLE_TO_LLONG_ACCURATE_TEST  -> H5_LDOUBLE_TO_LLONG_ACCURATE_RUN
#ifdef H5_LLONG_TO_LDOUBLE_CORRECT_TEST   -> H5_LLONG_TO_LDOUBLE_CORRECT_RUN
#ifdef H5_NO_ALIGNMENT_RESTRICTIONS_TEST  -> H5_NO_ALIGNMENT_RESTRICTIONS_RUN
#ifdef FC_DUMMY_MAIN
#ifdef H5_DISABLE_SOME_LDOUBLE_CONV_TEST  -> H5_DISABLE_SOME_LDOUBLE_CONV_RUN

HDFTests.c

execute make try-dump to create the list of macro definitions or execute: grep -Rn "#ifdef" HDFTests.c |cut -d' ' -f2 | sort -f | tr '\n' ' ' from shell.

DEV_T_IS_SCALAR FC_DUMMY_MAIN FC_DUMMY_MAIN FC_DUMMY_MAIN GETTIMEOFDAY_GIVES_TZ __GLIBC_PREREQ HAVE_ATTRIBUTE HAVE_C99_DESIGNATED_INITIALIZER HAVE_C99_FUNC HAVE_DEFAULT_SOURCE HAVE_DIRECT HAVE_FUNCTION HAVE_IOEO HAVE_LONG_LONG HAVE_OFF64_T HAVE_SOCKLEN_T HAVE_STAT64_STRUCT HAVE_STAT_ST_BLOCKS HAVE_STRUCT_TEXT_INFO HAVE_STRUCT_TIMEZONE HAVE_STRUCT_VIDEOCONFIG HAVE_SYS_SOCKET_H HAVE_SYS_TIME_H HAVE_SYS_TIME_H HAVE_SYS_TIME_H HAVE_SYS_TIME_H HAVE_SYS_TIME_H HAVE_SYS_TYPES_H HAVE_SYS_TYPES_H HAVE_TIMEZONE HAVE___TM_GMTOFF HAVE_TM_GMTOFF HAVE_UNISTD_H PRINTF_LL_WIDTH STDC_HEADERS SYSTEM_SCOPE_THREADS TEST_DIRECT_VFD_WORKS TEST_LFS_WORKS TIME_WITH_SYS_TIME VSNPRINTF_WORKS

Makefile to run all test cases:

#  _____________________________________________________________________________
#  Copyright (c) <2020> <copyright Steven Varga, Toronto, On>
#  _____________________________________________________________________________

all: try-prefix conversion-prefix

# clean target will remove it, so create on demand
setup-dir:
    @mkdir -p build
# 
remove_prefix = $(shell echo '$1' | cut -d'-' -f2)

conversion-src = ConversionTests.c
conversion-list = H5_LDOUBLE_TO_LONG_SPECIAL_TEST H5_LONG_TO_LDOUBLE_SPECIAL_TEST H5_LDOUBLE_TO_LLONG_ACCURATE_TEST     H5_LLONG_TO_LDOUBLE_CORRECT_TEST H5_NO_ALIGNMENT_RESTRICTIONS_TEST H5_DISABLE_SOME_LDOUBLE_CONV_TEST
conversion-prefix: $(foreach var,$(conversion-list), conversion-$(var))
conversion-%: $(conversion-src) setup-dir
    $(eval value=$(call remove_prefix,$@))
    @$(CC) -o build/$(value) -D$(value) $(conversion-src)
    @./build/$(value) ; echo $(value) $$?

try-src = HDFTests.c
try-list = DEV_T_IS_SCALAR FC_DUMMY_MAIN FC_DUMMY_MAIN FC_DUMMY_MAIN GETTIMEOFDAY_GIVES_TZ __GLIBC_PREREQ HAVE_ATTRIBUTE HAVE_C99_DESIGNATED_INITIALIZER HAVE_C99_FUNC HAVE_DEFAULT_SOURCE HAVE_DIRECT HAVE_FUNCTION HAVE_IOEO HAVE_LONG_LONG HAVE_OFF64_T HAVE_SOCKLEN_T HAVE_STAT64_STRUCT HAVE_STAT_ST_BLOCKS HAVE_STRUCT_TEXT_INFO HAVE_STRUCT_TIMEZONE HAVE_STRUCT_VIDEOCONFIG HAVE_SYS_SOCKET_H HAVE_SYS_TIME_H HAVE_SYS_TIME_H HAVE_SYS_TIME_H HAVE_SYS_TIME_H HAVE_SYS_TIME_H HAVE_SYS_TYPES_H HAVE_SYS_TYPES_H HAVE_TIMEZONE HAVE___TM_GMTOFF HAVE_TM_GMTOFF HAVE_UNISTD_H PRINTF_LL_WIDTH STDC_HEADERS SYSTEM_SCOPE_THREADS TEST_DIRECT_VFD_WORKS TEST_LFS_WORKS TIME_WITH_SYS_TIME VSNPRINTF_WORKS
try-prefix: $(foreach var,$(try-list), try-$(var))
try-%: $(try-src) setup-dir
    $(eval value=$(call remove_prefix,$@))
    @$(CC) -o build/$(value) -D$(value) $(try-src) > /dev/null 2>&1; echo $(value) $$?

clean: 
    @rm -rf build 

.PHONY: all try-prefix conversion-prefix

Generated output

The following list probably should contain the platform hash with the list of values.

DEV_T_IS_SCALAR 1
FC_DUMMY_MAIN 1
GETTIMEOFDAY_GIVES_TZ 1
__GLIBC_PREREQ 1
HAVE_ATTRIBUTE 0
HAVE_C99_DESIGNATED_INITIALIZER 0
HAVE_C99_FUNC 0
HAVE_DEFAULT_SOURCE 0
HAVE_DIRECT 0
HAVE_FUNCTION 0
HAVE_IOEO 1
HAVE_LONG_LONG 1
HAVE_OFF64_T 1
HAVE_SOCKLEN_T 1
HAVE_STAT64_STRUCT 1
HAVE_STAT_ST_BLOCKS 0
HAVE_STRUCT_TEXT_INFO 1
HAVE_STRUCT_TIMEZONE 1
HAVE_STRUCT_VIDEOCONFIG 1
HAVE_SYS_SOCKET_H 1
HAVE_SYS_TIME_H 1
HAVE_SYS_TYPES_H 1
HAVE_TIMEZONE 0
HAVE___TM_GMTOFF 1
HAVE_TM_GMTOFF 0
HAVE_UNISTD_H 1
PRINTF_LL_WIDTH 1
STDC_HEADERS 0
SYSTEM_SCOPE_THREADS 0
TEST_DIRECT_VFD_WORKS 1
TEST_LFS_WORKS 0
TIME_WITH_SYS_TIME 0
VSNPRINTF_WORKS 0
H5_LDOUBLE_TO_LONG_SPECIAL_TEST 1
H5_LONG_TO_LDOUBLE_SPECIAL_TEST 1
H5_LDOUBLE_TO_LLONG_ACCURATE_TEST 0
H5_LLONG_TO_LDOUBLE_CORRECT_TEST 0
H5_NO_ALIGNMENT_RESTRICTIONS_TEST 0
H5_DISABLE_SOME_LDOUBLE_CONV_TEST 1

yggdrasil: Mose Giordano

using BinaryBuilder

# Collection of sources required to build HDF5
name = "HDF5"
version = v"1.12.0"

sources = [
    GitSource("https://github.com/steven-varga/hdf5.git",
              "b49b22d6882d97b1ec01d482822955bd8e923203"),
]

# Bash recipe for building across all platforms
script = raw"""
cd ${WORKSPACE}/srcdir/hdf5/
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=${prefix} \
    -DCMAKE_TOOLCHAIN_FILE=${CMAKE_TARGET_TOOLCHAIN} \
    -DHDF5_BUILD_CPP_LIB=OFF \
    -DONLY_SHARED_LIBS=ON \
    -DHDF5_BUILD_HL_LIB=ON \
    -DHDF5_ENABLE_Z_LIB_SUPPORT=ON \
    -DHDF5_ENABLE_SZIP_SUPPORT=OFF \
    -DHDF5_ENABLE_SZIP_ENCODING=OFF \
    -DBUILD_TESTING=OFF
make -j${nproc}
make install
install_license ${WORKSPACE}/srcdir/hdf5/COPYING*
"""

# These are the platforms we will build for by default, unless further
# platforms are passed in on the command line
platforms = supported_platforms()

# The products that we will ensure are always built
products = [
    LibraryProduct("libhdf5", :libhdf5),
    LibraryProduct("libhdf5_hl", :libhdf5_hl),
]

# Dependencies that must be installed before this package can be built
dependencies = [
    Dependency("Zlib_jll"),
]

# Build the tarballs, and possibly a `build.jl` as well.
build_tarballs(ARGS, name, version, sources, script, platforms, products, dependencies)

March 3, 2020
in hdf5, c++
13 min read

CSV to HDF5

Public domain CSV example file obtained from this link The CSV library is Fast C++ CSV Parser

C++/C representation

arbitrary pod struct can be represented in HDF5 format, one easy representation of strings is character array. An alternative --often better performing --representation would be to factor out strings from numerical data, then save them in separate datasets.

#ifndef  CSV2H5_H 
#define  CSV2H5_H

/*define C++ representation as POD struct*/
struct input_t {
    long MasterRecordNumber;
    unsigned int Hour;
    double Latitude;
    double Longitude;
    char ReportedLocation[20]; // character arrays are supported
};
#endif

Reading the CSV is rather easy thanks to Fast C++ CSV Parser, a single header file csv.h is attached to the project. Not only fast and simple but also elegantly allows to specify specific columns marked as ncols: N_COLS

io::CSVReader<N_COLS> in("input.csv"); // number of cols may be less, than total columns in a row, we're to read only 5
in.read_header(io::ignore_extra_column, "Master Record Number", "Hour", "Reported_Location","Latitude","Longitude");
[...]
while(in.read_row(row.MasterRecordNumber, row.Hour, ptr, row.Latitude, row.Longitude)){
    [...]

The HDF5 part is matching in simplicity:

    h5::fd_t fd = h5::create("output.h5",H5F_ACC_TRUNC);
    h5::pt_t pt = h5::create<input_t>(fd,  "monroe-county-crash-data2003-to-2015.csv",
                 h5::max_dims{H5S_UNLIMITED}, h5::chunk{1024} | h5::gzip{9} ); // compression, chunked, unlimited size
    [...]
    while(...){
        h5::append(pt, row); // append operator uses internal buffers to cache and convert row insertions to block/chunk operations
    }
    [...]

The TU translation unit is scanned with LLVM based h5cpp compiler and the necessary hdf5 specific type descriptors are produced:

#ifndef H5CPP_GUARD_mzMuQ
#define H5CPP_GUARD_mzMuQ

namespace h5{
    //template specialization of input_t to create HDF5 COMPOUND type
    template<> hid_t inline register_struct<input_t>(){
        //hsize_t at_00_[] ={20};            hid_t at_00 = H5Tarray_create(H5T_STRING,20,at_00_);
        hid_t at_00 = H5Tcopy (H5T_C_S1); H5Tset_size(at_00, 20);
        hid_t ct_00 = H5Tcreate(H5T_COMPOUND, sizeof (input_t));
        H5Tinsert(ct_00, "MasterRecordNumber",  HOFFSET(input_t,MasterRecordNumber),H5T_NATIVE_LONG);
        H5Tinsert(ct_00, "Hour",    HOFFSET(input_t,Hour),H5T_NATIVE_UINT);
        H5Tinsert(ct_00, "Latitude",    HOFFSET(input_t,Latitude),H5T_NATIVE_DOUBLE);
        H5Tinsert(ct_00, "Longitude",   HOFFSET(input_t,Longitude),H5T_NATIVE_DOUBLE);
        H5Tinsert(ct_00, "ReportedLocation",    HOFFSET(input_t,ReportedLocation),at_00);

        //closing all hid_t allocations to prevent resource leakage
        H5Tclose(at_00); 

        //if not used with h5cpp framework, but as a standalone code generator then
        //the returned 'hid_t ct_00' must be closed: H5Tclose(ct_00);
        return ct_00;
    };
}
H5CPP_REGISTER_STRUCT(input_t);

#endif

The entire project can be downloaded from this link but for completeness here is the source file:

/* Copyright (c) 2020 vargaconsulting, Toronto,ON Canada
 * Author: Varga, Steven <steven@vargaconsulting.ca>
 */

#include "csv.h"
// data structure include file: `struct.h` must precede 'generated.h' as the latter contains dependencies
// from previous
#include "struct.h"

#include <h5cpp/core>      // has handle + type descriptors
// sandwiched: as `h5cpp/io` depends on `henerated.h` which needs `h5cpp/core`
    #include "generated.h" // uses type descriptors
#include <h5cpp/io>        // uses generated.h + core 

int main(){

    // create HDF5 container
    h5::fd_t fd = h5::create("output.h5",H5F_ACC_TRUNC);
    // create dataset   
    // chunk size is unrealistically small, usually you would set this such that ~= 1MB or an ethernet jumbo frame size
    h5::ds_t ds = h5::create<input_t>(fd,  "simple approach/dataset.csv",
                 h5::max_dims{H5S_UNLIMITED}, h5::chunk{10} | h5::gzip{9} );
    // `h5::ds_t` handle is seamlessly cast to `h5::pt_t` packet table handle, this could have been done in single step
    // but we need `h5::ds_t` handle to add attributes
    h5::pt_t pt = ds;
    // attributes may be added to `h5::ds_t` handle
    ds["data set"] = "monroe-county-crash-data2003-to-2015.csv";
    ds["cvs parser"] = "https://github.com/ben-strasser/fast-cpp-csv-parser"; // thank you!

    constexpr unsigned N_COLS = 5;
    io::CSVReader<N_COLS> in("input.csv"); // number of cols may be less, than total columns in a row, we're to read only 5
    in.read_header(io::ignore_extra_column, "Master Record Number", "Hour", "Reported_Location","Latitude","Longitude");
    input_t row;                           // buffer to read line by line
    char* ptr;      // indirection, as `read_row` doesn't take array directly
    while(in.read_row(row.MasterRecordNumber, row.Hour, ptr, row.Latitude, row.Longitude)){
        strncpy(row.ReportedLocation, ptr, STR_ARRAY_SIZE); // defined in struct.h
        h5::append(pt, row);
        std::cout << std::string(ptr) << "\n";
    }
    // RAII closes all allocated resources
}

the output of h5dump -pH output.h5

GROUP

name="__codelineno-17-1" href="#__codelineno-17-1">HDF5 "output.h5" { "/" { GROUP "simple approach" { DATASET "dataset.csv" { DATATYPE H5T_COMPOUND { H5T_STD_I64LE "MasterRecordNumber"; H5T_STD_U32LE "Hour"; H5T_IEEE_F64LE "Latitude"; H5T_IEEE_F64LE "Longitude"; H5T_ARRAY { [20] H5T_STD_I8LE } "ReportedLocation"; } DATASPACE SIMPLE { ( 199 ) / ( H5S_UNLIMITED ) } STORAGE_LAYOUT { CHUNKED ( 10 ) SIZE 7347 (1.517:1 COMPRESSION) } FILTERS { COMPRESSION DEFLATE { LEVEL 9 } } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE H5D_FILL_VALUE_DEFAULT } ALLOCATION_TIME { H5D_ALLOC_TIME_INCR } ATTRIBUTE "cvs parser" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SCALAR } ATTRIBUTE "data set" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SCALAR } } } } }

February 19, 2020
in hdf5, c++
4 min read

H5Pset_shared_mesg_nindexes() Fails for N > 8 in HDF5 1.10.6

problem: H5Pset_shared_mesg_nindexes(fcpl_id, N) fails with N > 8

HDF5 Version: 1.10.6 documentation doesn't mention N <= H5O_SHMESG_MAX_NINDEXES the N argument is limited to maximum H5O_SHMESG_MAX_NINDEXES

int main() {
    char name[1024];
    hid_t fcpl_id, fd_id;
    herr_t err;
    unsigned N=9, nidx;

    fcpl_id =  H5Pcreate(H5P_FILE_CREATE);
    err = H5Pset_shared_mesg_nindexes( fcpl_id, N );
    fd_id = H5Fcreate("test.h5", H5F_ACC_TRUNC, fcpl_id, H5P_DEFAULT );
    H5Pget_shared_mesg_nindexes( fcpl_id, &nidx );
    printf("results: %i - %i = %i \n", N, nidx, H5O_SHMESG_MAX_NINDEXES);

    H5Pclose(fcpl_id);
    H5Fclose(fd_id);
}

output of `make test`

cc main.c -lhdf5  -lz -ldl -lm -o set_shared_mesg_nindices  
./set_shared_mesg_nindices
HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5Pfcpl.c line 831 in H5Pset_shared_mesg_nindexes(): number of indexes is greater than H5O_SHMESG_MAX_NINDEXES
    major: Invalid arguments to routine
    minor: Out of range
results: 9 - 0 = 8

Documenting that N <= H5O_SHMESG_MAX_NINDEXES can help to prevent error message

linking:

    linux-vdso.so.1 (0x00007fff929f5000)
    libhdf5.so.103 => /usr/local/lib/libhdf5.so.103 (0x00007ff443b55000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff443764000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff443547000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff443343000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff442fa5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff444332000)

February 18, 2020
in hdf5, c++
5 min read

H5Pset_istore_k() Fails for Values Above 16384 in HDF5 1.10.6

problem: H5Pset_istore_k(fcpl_id, ik) fails

HDF5 Version: 1.10.6

documentation suggest ik=65536 as max value, whereas library accepts only ik < 16384+1 HDF5 1.10.6, others maybe affected

int main() {
    char name[1024];
    hid_t fcpl_id, fd_id;
    herr_t err;
    unsigned ik;
    //unsigned max = 16384+1;
    unsigned max = 65536;
    for(unsigned i=32; i < max; i*=2) {
        sprintf(name, "name_%i.h5", i);
        fcpl_id =  H5Pcreate(H5P_FILE_CREATE);
        err = H5Pset_istore_k(fcpl_id, i);
        fd_id = H5Fcreate( name, H5F_ACC_TRUNC, fcpl_id, H5P_DEFAULT );
        H5Pget_istore_k(fcpl_id, &ik );
        printf("results: %i - %i = %i ", i, ik, i - ik);

        H5Pclose(fcpl_id);
        H5Fclose(fd_id);
    }
}

output of `make test`

cc main.c -lhdf5  -lz -ldl -lm -o set_istore_k  
./set_istore_k
results: 32 - 32 = 0 
results: 64 - 64 = 0 
results: 128 - 128 = 0 
results: 256 - 256 = 0 
results: 512 - 512 = 0 
results: 1024 - 1024 = 0 
results: 2048 - 2048 = 0 
results: 4096 - 4096 = 0 
results: 8192 - 8192 = 0 
results: 16384 - 16384 = 0 
HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5Pfcpl.c line 645 in H5Pset_istore_k(): istore IK value exceeds maximum B-tree entries
    major: Invalid arguments to routine
    minor: Bad value
results: 32768 - 32 = 32736

adjusting the documentation may be a possible fix

linking:

    linux-vdso.so.1 (0x00007fff929f5000)
    libhdf5.so.103 => /usr/local/lib/libhdf5.so.103 (0x00007ff443b55000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff443764000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff443547000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff443343000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff442fa5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff444332000)

February 17, 2020
in hdf5, c++
4 min read

H5Pset_sizes() Crashes for 16-byte Offsets in HDF5 1.10.6

problem: H5Pset_sizes(fcpl_id, offset, length) crashes

HDF5 Version: 1.10.6

When either of the values take on value 16 library crashes, tested against HDF5 1.10.6, others maybe affected

size_t offset[] = {2,4,8,16};
size_t length[] = {2,4,8,16};

size_t N = 4;
    for( size_t i=0; i<N; i++) {
        for( size_t j=0; j<N; j++ ) {
            sprintf(name, "name_%ld_%ld.h5", offset[i], length[j]);
            fcpl_id =  H5Pcreate(H5P_FILE_CREATE);
            err = H5Pset_sizes(fcpl_id, offset[i], length[j]);
            fd_id = H5Fcreate( name, H5F_ACC_TRUNC, fcpl_id,H5P_DEFAULT );
            H5Pclose(fcpl_id);
            H5Fclose(fd_id);
        }
    }
}

output of `make test`

cc main.c -lhdf5  -lz -ldl -lm -o set_sizes 
./set_sizes
free(): invalid pointer
Makefile:12: recipe for target 'test' failed
make: *** [test] Aborted (core dumped)

Change size_t N=4 to size_t N=3 tp reduce test range from {2,4,8,16} X {2,4,8,16} to {2,4,8} X {2,4,8} notice the missing 16 which is valid input according to documentation.

linking:

    linux-vdso.so.1 (0x00007fff929f5000)
    libhdf5.so.103 => /usr/local/lib/libhdf5.so.103 (0x00007ff443b55000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff443764000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff443547000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff443343000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff442fa5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff444332000)

February 13, 2020
in hdf5, c++
6 min read

Optimizing Subset Reads from Contiguous HDF5 Datasets

The Use Case (OP’s Context)

The question revolved around efficiently reading subarrays from datasets stored using HDF5’s contiguous layout. The default contiguous placement may not be ideal for slicing access patterns, especially for small-scale reads or irregular memory access.

My H5CPP-Driven Recommendation

If you’re working in C++, you might find H5CPP a smoother path. Here’s what it offers:

Compact layout by default for small datasets — stored inline, fast startup, minimal overhead.
Adaptive chunking for larger datasets — just set the h5::chunk property to control chunk size.
Automatic fallback to contiguous storage if you don’t specify chunking — so behavior stays predictable.
Zero-copy reads — H5CPP optimizes typed memory I/O, eliminating performance penalty over vanilla C HDF5 calls.

In practice, the Example folder in H5CPP includes code snippets for common use cases, demonstrating how to get clean, efficient subset reads across many patterns.

Why It Matters

Scenario	Contiguous Layout	Compact Layout (H5CPP)	Chunked Layout (H5CPP)
Small datasets (few KB)	Always external	In-file compact — fast access	Possible overhead
Larger datasets (MB+)	Static layout	May overflow compact limits	Chunking enables efficient slicing
Subset reads (e.g., slices)	Poor performance	May work if in-file	High performance, cache-friendly
C++ typed memory access	Manual coding	Zero-copy API	Zero-copy with chunk control

In short, using a one-size-fits-all layout, like contiguous, is often suboptimal. Think about the platform’s characteristics and data access patterns. H5CPP gives you the tools to adapt layout to the job—without overhead or boilerplate.

TL;DR

Small datasets? Get compact-in-file layout by default in H5CPP — no config needed.
Large datasets? Enable chunking for fast sliding-window or subarray reads.
Want typed access in C++? Use H5CPP’s zero-copy interface with performance parity to HDF5 C.

February 4, 2020
in hdf5, c++
7 min read

I/O Performance with H5CPP: 687 MB/s for Small Matrices

🧪 Problem

High-performance computing projects often involve thousands of small matrices—each one carrying results, measurements, or partial data.

If you try to naively write each matrix as an individual dataset with the HDF5 C API, things quickly slow down.

We ran an experiment to compare: - Time to generate 4000 small matrices - Time to write them to disk - Dataset creation overhead

All using H5CPP + Armadillo.

⚙️ Setup

We generated 4000 random arma::mat matrices and wrote each one into a dedicated HDF5 dataset using H5CPP’s type-safe API:

arma::mat X = arma::randu<arma::mat>(20, 20); // ~3.2 KB each
for (size_t i = 0; i < 4000; i++) {
    h5::ds_t ds = h5::create<arma::mat>(fd, "/dataset/" + std::to_string(i),
        h5::current_dims{20,20}, h5::chunk{20,20} | h5::gzip{6});
    h5::write(ds, X);
}

HDF5 File: container.h5 Hardware: SSD rated at 500 MB/s Library: HDF5 1.10.x + H5CPP + Armadillo

📈 Results

CREATING 4000 h5::ds_t     cost: 0.215 s  (≈ 18,555 dataset/sec)
GENERATING 4000 matrices   cost: 30.93 s  (≈ 129 matrix/sec)
WRITING    4000 matrices   cost: 1.466 s  (≈ 2728 matrix/sec)

THROUGHPUT: 687.61 MB/s

Total I/O payload: ~1.0 GB Effective bandwidth: ~687 MB/s

That’s over 90% of theoretical SSD write speed, with zero manual buffer management.

🔍 What Makes This Fast?

Chunked layout: We write with h5::chunk{20,20}, which matches the matrix shape
Compression: With h5::gzip{6} we get file size savings without hurting speed
No reopening datasets: Handles persist across writes
RAII-powered batching: We defer closing until scope exits

🧪 Compared to C API?

The HDF5 C API gives you:

H5Dcreate2(...)
H5Dwrite(...)
H5Dclose(...)

...all for each matrix. That introduces latency, boilerplate, and opens the door to error-prone lifecycle handling.

✨ Takeaways

Task	Time	Rate
Dataset creation	0.215s	18,555 datasets/s
Matrix generation	30.9s	129 matrices/s
Matrix write (I/O)	1.46s	687 MB/s

🧠 Most of the time was spent generating matrices, not writing them.
🚀 H5CPP I/O speed rivaled raw fwrite() — but with structured layout and compression.

📌 Conclusion

If you’re writing small matrices to HDF5 at scale, H5CPP gives you raw performance + clean syntax.

No memory leaks, no handle juggling, no wasted cycles.

December 20, 2019
in hdf5, c++
7 min read

Zero-Cost C++ Structs to HDF5 Compound Types with H5CPP

🧬 The Setup: From HPC POD Structs to HDF5 COMPOUNDs

You're handed an HDF5 file with compound data like this:

DATASET "g_data" {
  DATATYPE  H5T_COMPOUND {
    H5T_IEEE_F64LE "temp";
    H5T_IEEE_F64LE "density";
    H5T_ARRAY { [3] H5T_IEEE_F64LE } "B";
    H5T_ARRAY { [3] H5T_IEEE_F64LE } "V";
    H5T_ARRAY { [20] H5T_IEEE_F64LE } "dm";
    H5T_ARRAY { [9] H5T_IEEE_F64LE } "jkq";
  }
  DATASPACE SIMPLE { ( 70, 3, 3 ) / ( 70, 3, 3 ) }
}

With H5CPP, you don't need to touch the C API. Just describe this with a plain-old C++ struct:

namespace sn {
    struct record_t {
        double temp;
        double density;
        double B[3];
        double V[3];
        double dm[20];
        double jkq[9];
    };
}

That’s it.

🧰 Write Code as If You Had a Magic HDF5 Interface

No manual H5Tinsert, no dataspace juggling:

#include <vector>
#include "struct.h"
#include <h5cpp/core>
#include "generated.h"  // <-- auto-generated with h5cpp compiler
#include <h5cpp/io>

int main(){
    h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);
    h5::create<sn::record_t>(fd, "/Module/g_data", h5::max_dims{70, 3, 3});

    auto dataset = h5::read<std::vector<sn::record_t>>(fd, "/Module/g_data");
    for (const auto& rec : dataset)
        std::cerr << rec.temp << " ";
    std::cerr << std::endl;
}

Compile and run:

h5cpp struct.cpp -- -std=c++17 -I/usr/include -Dgenerated.h
g++ struct.o -lhdf5 -lz -ldl -lm -o struct
./struct

🧠 Under the Hood: Codegen Output

H5CPP automatically generates a minimal type descriptor that registers your struct with the HDF5 type system:

template<> hid_t inline register_struct<sn::record_t>(){
    // array declarations omitted for brevity
    hid_t ct = H5Tcreate(H5T_COMPOUND, sizeof(sn::record_t));
    H5Tinsert(ct, "temp", HOFFSET(sn::record_t, temp), H5T_NATIVE_DOUBLE);
    ...
    return ct;
}

Everything is cleaned up (H5Tclose(...)) to avoid resource leaks.

📚 Can I Extract One Field From a Compound?

"Can I create a virtual dataset from just the 'density' field in g_data?"

Short answer: yes, but not easily.

HDF5 virtual datasets (VDS) work best with entire datasets, not fields of compound types. However, a workaround is to:

Use hyperslab selections to read only the density field in software.
Or, repack the data using H5CPP into a derived dataset containing only the fields you want.

Future H5CPP support might wrap this more cleanly, but currently, you'll want to do the extraction in-memory.

✅ Summary

Describe your data in C++ once.
H5CPP generates the boilerplate and builds your type descriptors.
You can now read, write, and manipulate rich compound datasets without touching the C API.
Clean syntax, high performance, zero leaks.

Ready to ditch boilerplate and focus on real work?

December 19, 2019
in hdf5, c++
11 min read

Extracting Insight from HDF5 Compound Datasets Using H5CPP

"What if you could treat HDF5 compound datasets like simple C++ structs and just… write your simulation code? Forget the messy HDF5 C API. Write clean, modern C++. Let the compiler do the rest."

🚀 The Problem

One of our HPC applications writes simulation results to HDF5 files using compound types—essentially structured records with fields like temperature, density, and multidimensional vectors such as velocity and magnetic fields. A simplified dump of the file looks like this:

HDF5 "test.h5" {
  GROUP "/" {
    ATTRIBUTE "GRID_DIMENSIONS" { H5T_STD_I32LE (3) }
    ATTRIBUTE "X_AXIS"          { H5T_IEEE_F64LE (3) }
    ATTRIBUTE "Z_AXIS"          { H5T_IEEE_F64LE (70) }

    GROUP "Module" {
      DATASET "g_data" {
        DATATYPE H5T_COMPOUND {
          H5T_IEEE_F64LE "temp";
          H5T_IEEE_F64LE "density";
          H5T_ARRAY { [3] H5T_IEEE_F64LE } "B";
          H5T_ARRAY { [3] H5T_IEEE_F64LE } "V";
          H5T_ARRAY { [20] H5T_IEEE_F64LE } "dm";
          H5T_ARRAY { [9] H5T_IEEE_F64LE } "jkq";
        }
        DATASPACE SIMPLE { (70, 3, 3) }
      }
    }
  }
}

Now imagine wanting to selectively extract the "density" field into its own dataset—for visualization, analysis, or postprocessing. Can we filter compound dataset fields like this? Better yet, can we work with this structure using modern C++?

Let’s take a practical dive.

🔧 Step 1: Define the POD Struct

Start by expressing your compound layout as a C++ POD type (we recommend using nested namespaces for organization):

#ifndef  H5TEST_STRUCT_01 
#define  H5TEST_STRUCT_01

namespace sn {
    struct record_t {
        double temp;
        double density;
        double B[3];
        double V[3];
        double dm[20];
        double jkq[9];
    };
}
#endif

You’ll use this structure as the basis for both reading and writing your data.

✨ Step 2: Write the Code Like You Mean It

With H5CPP, you don’t need to write any boilerplate for type descriptors or deal with the HDF5 C API. Just:

#include <iostream>
#include <vector>
#include "struct.h"
#include <h5cpp/core>
  #include "generated.h" // generated by the h5cpp toolchain
#include <h5cpp/io>

int main() {
    h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);

    // Create the dataset
    h5::create<sn::record_t>(fd, "/Module/g_data", h5::max_dims{70,3,3});

    // Read entire dataset into memory
    auto dataset = h5::read<std::vector<sn::record_t>>(fd, "/Module/g_data");

    // Loop over records and print temperature
    for (const auto& rec : dataset)
        std::cerr << rec.temp << " ";
    std::cerr << std::endl;
}

No type registration? It’s handled for you.

🧠 Step 3: Generate the Type Descriptor

Use the H5CPP compiler to generate the type description (generated.h):

h5cpp struct.cpp -- -std=c++17 -I/usr/include

It emits something like:

template<> hid_t inline register_struct<sn::record_t>(){
    // define compound type layout with H5Tinsert() calls
    // ...
    H5Tinsert(ct_00, "density", HOFFSET(sn::record_t, density), H5T_NATIVE_DOUBLE);
    // ...
    return ct_00;
}

This magic glues your record_t to an HDF5 COMPOUND type with nested arrays.

🧪 Bonus: Can We Slice Fields From Compound Datasets?

Yes, with HDF5 1.10+ you can use virtual datasets (VDS) to reference subsets of existing datasets, but there are some caveats:

You cannot directly extract a field from a compound dataset as a virtual dataset without writing a filter or preprocessor.
H5CPP doesn't (yet) offer field projections, but you can write a thin wrapper to convert std::vector<record_t> into std::vector<double> by extracting the field manually.

A future enhancement could be writing VDS definitions that reference memory offsets within compound datasets, but for now, you can batch process like this:

std::vector<double> densities;
for (auto& rec : dataset)
    densities.push_back(rec.density);

Simple, fast, cache-friendly.

📈 Output and Verification

Once compiled and run, you’ll get:

./struct
h5dump -pH test.h5

Showing the full compound structure, chunk layout, and memory offsets exactly as designed.

🧩 Conclusion

Working with compound HDF5 types doesn’t have to mean writing hundreds of lines of C code. With H5CPP:

You declare a C++ struct that mirrors your HDF5 layout.
The compiler auto-generates the type description.
Your code stays clean, idiomatic, and testable.

And when you want to extract a field like "density"? You don’t need virtual datasets—you just write C++.

“If you can write a struct, you can store structured data.”