June 2022

June 29, 2022
in hdf5, c++
6 min read

Cross-language Messaging with ZeroMQ: C++, Python, and Fortran

🧩 Setup

We’ll use ZeroMQ’s PUSH/PULL pattern to build a minimal, language-agnostic messaging microservice.

Each sender (written in Python, C++, or Fortran) pushes 64-bit values to a single receiver.

📦 Dependencies

Linux (Debian/Ubuntu):

sudo apt-get install libzmq3-dev
````

#### Python:

```bash
python3 -m pip install pyzmq

Fortran ZMQ bindings:

Install fzmq:

git clone https://github.com/richsnyder/fzmq && cd fzmq
mkdir build && cd build
cmake -DBUILD_DOCS=OFF ../
sudo make install

🛰️ Sender: Python

#!/usr/bin/python3
import zmq

ctx = zmq.Context()
sock = ctx.socket(zmq.PUSH)
sock.connect("tcp://localhost:5555")

for x in range(100):
    sock.send(x.to_bytes(8, 'little'))

Highlights:

Sends 8-byte integers in little-endian format
Easy for testing other receivers

🛰️ Sender: C++

#include <zmq.h>
#include <cstdint>
#include <cstring>

int main() {
    void* ctx = zmq_ctx_new();
    void* sock = zmq_socket(ctx, ZMQ_PUSH);
    zmq_connect(sock, "tcp://localhost:5555");

    for(uint64_t x = 0; x < 100; ++x)
        zmq_send(sock, &x, sizeof(x), 0);

    zmq_close(sock);
    zmq_ctx_term(ctx);
}

Highlights:

Sends raw uint64_t values (8 bytes)
Matches Python format

🛰️ Sender: Fortran

program send
  use fzmq
  implicit none

  type(zmq_context) :: context
  type(zmq_socket)  :: sock
  integer(kind=8)   :: x
  integer           :: rc

  context = zmq_ctx_new()
  sock = zmq_socket(context, ZMQ_PUSH)
  call zmq_connect(sock, "tcp://localhost:5555")

  do x = 0, 99
    call zmq_send(sock, x, 8, 0)
  end do

  call zmq_close(sock)
  call zmq_ctx_term(context)
end program send

Highlights:

Uses fzmq bindings
Sends binary 64-bit integers

🎯 Receiver: C++

#include <zmq.h>
#include <cstdint>
#include <cstdio>

int main() {
    void* ctx = zmq_ctx_new();
    void* sock = zmq_socket(ctx, ZMQ_PULL);
    zmq_bind(sock, "tcp://*:5555");

    for(uint64_t x; true;) {
        zmq_recv(sock, &x, sizeof(x), 0);
        printf("recv: %lu\n", x);
    }

    zmq_close(sock);
    zmq_ctx_term(ctx);
}

Highlights:

Pulls 64-bit integers from any connected sender
No language-specific deserialization required

✅ Summary

ZeroMQ’s PUSH/PULL pattern makes multi-language IPC a breeze.

Role	Language	Notes
Sender	Python	Uses `pyzmq`, clean syntax
Sender	C++	Raw `zmq_send` of binary integers
Sender	Fortran	Uses `fzmq` bindings
Receiver	C++	Prints values from all senders

Run the receiver first, then launch any combination of senders. Messages will stream in regardless of language. No serialization frameworks, no boilerplate.

April 7, 2022
in hdf5, c++
8 min read

Automated Pretty Printing for STL-like Containers in H5CPP

Debugging C++ is an art and often demands the setup of tools like valgrind, cachegrind, or gdb. These tools are powerful, but sometimes all you need is a quick look at what's inside your container.

Enter: H5CPP’s feature detection idiom-based pretty-printing system for STL-like containers.

Inspired by Walter Brown’s WG21 N4436 paper, we offer a mechanism that allows you to simply:

std::cout << stl_like_object;
````

Where `stl_like_object` can be any type that:

* provides `.begin()` and `.end()` (like vectors, lists, maps…)
* offers stack/queue-like interfaces (`.top()`, `.pop()`, `.size()`)
* or even composite/ragged containers like `vector<vector<T>>`, `tuple<...>`, etc.

## 🔍 What it does

The current implementation supports:

* **Recursive pretty-printing**
* **Line width control** (via `H5CPP_CONSOLE_WIDTH`)
* **In-line visualization** of arbitrarily nested structures

This will eventually replace/enhance the H5CPP persistence layer with general I/O capabilities.

---

## 📦 Example Output

Here's what `./pprint-test` prints to `stdout`:

```text
LISTS/VECTORS/SETS:
---------------------------------------------------------------------
   array<string,7>:[xSs,wc,gu,Ssi,Sx,pzb,OY]
            vector:[CDi,PUs,zpf,Hm,teO,XG,bu,QZs]
             deque:[256,233,23,89,128,268,69,278,130]
              list:[95,284,24,124,49,40,200,108,281,251,57, ...]
      forward_list:[147,76,81,193,44]
               set:[V,f,szy,v]
     unordered_set:[2.59,1.86,2.93,1.78,2.43,2.04,1.69]
          multiset:[3,5,12,21,23,28,30,30]
unordered_multiset:[gZ,rb,Dt,Q,Ark,dW,Ez,wmE,GwF]

And yes, it continues with:

Adaptors: stack, queue, priority_queue
Associative Containers: map, multimap, unordered_map, ...
Ragged Arrays: like vector<vector<string>>
Tuples and Pairs: including nested structures

Here’s a teaser for ragged and tuple structures:

vector<vector<string>>:[[pkwZZ,lBqsR,cmKt,PDjaS,Zj],[Nr,jj,xe,uC,bixzV],[uBAU,pXCa,fZEH,FIAIO]]
pair<int,string>:{4:upgdAdbvIB}
tuple<string,int,float,short>:<NHCmzhVVXJ,8,2.01756,7>

🧪 Run it yourself

g++ -I./include -o pprint-test.o -std=c++17 -DFMT_HEADER_ONLY -c pprint-test.cpp
g++ pprint-test.o -lhdf5 -lz -ldl -lm -o pprint-test
./pprint-test

You can set the line width by defining H5CPP_CONSOLE_WIDTH (e.g. -DH5CPP_CONSOLE_WIDTH=10).

📁 Code snippet

#include <h5cpp/H5Uall.hpp>
#include <h5cpp/H5cout.hpp>
#include <utils/types.hpp>

std::cout << mock::data<vector<vector<string>>>::get(2, 5, 3, 7) << "\n";

🔮 What’s Next?

If you’d like to bring this up to the level of Julia’s pretty print system, get in touch! The data generators that support arbitrary C++ type generation are part of the larger h5rnd project — a Prüfer sequence-based HDF5 dataset generator.

April 7, 2022
in hdf5, c++
6 min read

HDF5 Group Overhead: Measuring the Cost of a Million Groups

📦 The Problem

On [HDF5 forum][1], a user posed a question many developers eventually run into:

"We want to serialize an object tree into HDF5, where every object becomes a group. But group overhead seems large. Why is my 5 MB dataset now 120 MB?"

That’s a fair question. And we set out to quantify it.

⚙️ Experimental Setup

We generated 1 million unique group names using h5::utils::get_test_data<std::string>(N), then created an HDF5 group for each name at the root level of an HDF5 file using H5CPP.

for (size_t n = 0; n < N; n++)
    h5::gr_t{H5Gcreate(root, names[n].data(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT)};

We measured:

Total HDF5 file size on disk
Total size of the strings used
Net overhead from metadata alone

📊 Results

Total file size      :  79,447,104 bytes (~77MB)
Total string payload :   1,749,862 bytes
---------------------------------------------
Net metadata overhead: ~77MB

Average per group    : ~776 bytes

Yep. Each group costs roughly 776 bytes, even when it contains no datasets or attributes.

📈 Visual Summary

Entry Count	File Size	Payload Size	Overhead	Avg/Group
1,000,000	77.94 MB	1.75 MB	~76.2 MB	~776 B

🧠 Why So Expensive?

HDF5 groups are not just simple folders—they are implemented using B-trees and heaps. Each group object has:

A header
Link messages
Heap storage for names
Possibly indexed storage for lookup

This structure scales well for access, but incurs overhead for each group created.

🛠 Can Compression Help?

No. Compression applies to datasets, not group metadata. Metadata (including group structures) is stored in an uncompressed format by default.

💡 Recommendations

Avoid deep or wide group hierarchies with many small entries
If representing an object tree:
Consider flat structures with table-like metadata
Store object metadata as compound datasets or attributes
If you're tracking time-series or per-sample metadata:
Store as datasets with indexing, not groups

🔚 Final Thoughts

HDF5 is flexible—but that flexibility has a price when misapplied. Using groups to represent every atomic item or configuration object results in significant metadata bloat.

Use them judiciously. Use datasets liberally.

March 17, 2022
in hdf5, c++
6 min read

Custom Floating-Point and Opaque Types in HDF5

Extended precision floating-point (long double) is a common headache in data persistence. While HDF5 does support H5T_NATIVE_LDOUBLE, the inspection tools (h5dump) often misreport the stored numbers. Fortunately, H5CPP allows you to define custom datatypes—falling back to opaque storage when necessary.

Custom Type Definition

A specialization with H5T_OPAQUE lets you capture the raw 80-bit (or 128-bit) layout without worrying about architecture quirks:

namespace h5::impl::detail {
    template <>
    struct hid_t<opaque::ldouble_t, H5Tclose, true, true, hdf5::type>
        : public dt_p<opaque::ldouble_t> {
        using parent = dt_p<opaque::ldouble_t>;
        using parent::hid_t;
        using hidtype = opaque::ldouble_t;

        hid_t() : parent(H5Tcreate(H5T_OPAQUE, sizeof(opaque::ldouble_t))) {
            hid_t id = static_cast<hid_t>(*this);
        }
    };
}

This ensures your values are faithfully written and retrievable—even if the dumper chokes on them.

Example Output

A dataset written as H5T_NATIVE_LDOUBLE might display as garbage in h5dump:

DATASET "custom" {
   DATATYPE  H5T_NATIVE_LDOUBLE
   DATA {
   (0): 4.94066e-324, 4.94066e-324, ...
   }
}

…but the opaque fallback shows the raw byte patterns:

DATASET "opaque" {
   DATATYPE H5T_OPAQUE { OPAQUE_TAG "" }
   DATA {
   (0): 59:16:f5:f3:bb:e2:28:b8:01:40:00:00:00:00:00:00,
   (1): 21:93:2c:c5:cc:f5:5b:90:00:40:00:00:00:00:00:00,
   ...
   }
}

Why Two Views?

H5T_NATIVE_LDOUBLE: portable but misprinted by h5dump.
H5T_OPAQUE: exact bytes preserved, great for debugging or custom parsers.

On AMD64 systems, long double is stored in 16 bytes but only the first 10 bytes are significant. The last 6 are tail padding with undefined contents. This is why treating the type as opaque makes sense when fidelity is critical.

Beyond Long Double

You’re not limited to long double. With H5CPP you can adapt the same approach to:

half precision (float16)
nbit packed integers
arbitrary bit-level encodings

See the H5CPP examples for twobit, nbit, and half-float.

Takeaway

✅ Use H5T_NATIVE_LDOUBLE when you want logical portability.
✅ Wrap as OPAQUE when you need raw fidelity and control.
⚠️ Don’t panic when h5dump shows nonsense—the data is safe.

With H5CPP, you get the flexibility to represent any custom precision format—whether for simulation accuracy, bit-packed encodings, or raw experimental data.

March 16, 2022
in hdf5, c++
7 min read

HDF5 and `long double`: Precision Stored, Precision Misread

When working with scientific simulations, precision matters. Many codes rely on long double to squeeze out a few more digits of accuracy. The good news: HDF5 supports H5T_NATIVE_LDOUBLE natively, and with H5CPP you can write and read long double seamlessly.

The bad news: h5dump, the standard HDF5 inspection tool, stumbles. Instead of your carefully written values, you’ll often see tiny denormalized numbers (4.94066e-324) or other junk. This isn’t corruption—it’s just h5dump misinterpreting extended precision types.

Minimal Example

Consider the following snippet:

#include "h5cpp/all"
#include <vector>

int main() {
    std::vector<long double> x{0.0L, 0.01L, 0.02L, 0.03L, 0.04L,
                               0.05L, 0.06L, 0.07L, 0.08L, 0.09L};

    h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);
    h5::ds_t ds = h5::create<long double>(fd, "homogenious", h5::current_dims{5,3}, h5::chunk{1,3});
    h5::write(ds, x);
}

Running the code and dumping with h5dump:

h5dump -d /homogenious test.h5

DATA {
(0,0): 4.94066e-324, 4.94066e-324, 4.94066e-324,
...
}

Looks broken, right? But if you read back the dataset with HDF5 or H5CPP:

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

…the values are correct. The underlying file is perfectly fine.

Why the Mismatch?

h5dump uses its own format string and assumes a particular binary layout for floating-point numbers. On many systems, long double is 80-bit extended precision or 128-bit quad precision, which doesn’t map cleanly to the dumper’s print logic. Hence the nonsense output.

In other words: the storage layer is solid, but the diagnostic tool lags behind.

Compound Types with `long double`

HDF5 compound types with H5T_NATIVE_LDOUBLE also work, including arrays of extended-precision fields:

DATASET "stream-of-records" {
   DATATYPE  H5T_COMPOUND {
      H5T_NATIVE_LDOUBLE "temp";
      H5T_NATIVE_LDOUBLE "density";
      H5T_ARRAY { [3] H5T_NATIVE_LDOUBLE } "B";
      H5T_ARRAY { [3] H5T_NATIVE_LDOUBLE } "V";
      ...
   }
   DATASPACE SIMPLE { ( 10 ) / ( H5S_UNLIMITED ) }
   STORAGE_LAYOUT { CHUNKED ( 512 ) }
   FILTERS { COMPRESSION DEFLATE { LEVEL 9 } }
}

Here too, h5dump shows garbage, but reading with HDF5 APIs returns the expected values.

Takeaway

✅ Write long double safely with HDF5/H5CPP.
✅ Read long double safely with HDF5/H5CPP.
❌ Don’t trust h5dump for inspecting long double datasets.

March 15, 2022
in hdf5, c++
5 min read

Example to rewrite attributes

While it is not possible to append/extend attributes in HDF5, attributes often represent side band information with relatevily small size. In fact in previous HDF5 versions the attribute size was limited to 64kb, however Gerd Heber suggest this limitation has been lifted.

Haveing said the above it is a good strategy to break up append operation to read old dataset and write new dataset operations. The implementation is straightforward, and when used properly also is performant.

#include <vector>
#include <armadillo>
#include <h5cpp/all>

int main(void) {
    h5::fd_t fd = h5::create("h5cpp.h5",H5F_ACC_TRUNC);
    arma::mat data(10,5);

    { // 
    h5::ds_t ds = h5::write(fd,"some dataset", data);  // write dataset, and obtain descriptor
    h5::awrite(ds, "attribute name", {1,2,3,4,5,6,7});
    }
}

will give you the following layout

h5dump -a /some_dataset/attribute_name  h5cpp.h5
HDF5 "h5cpp.h5" {
ATTRIBUTE "attribute_name" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
   DATA {
   (0): 1, 2, 3, 4, 5, 6, 7
   }
}
}

To update the attribute you need to remove it first, since H5CPP doesn't yet do this automatically; in fact there is no h5::adelete either! -- however by design you can interchange HDF5 C API calls with H5CPP templates, so here is the update with H5Adelete and h5::awrite:

H5Adelete(ds,  "attribute name");
h5::awrite(ds, "attribute name", values);

h5dump -a /some_dataset/attribute_name  h5cpp.h5
HDF5 "h5cpp.h5" {
ATTRIBUTE "attribute_name" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 14 ) / ( 14 ) }
   DATA {
   (0): 1, 2, 3, 4, 5, 6, 7, 20, 21, 22, 23, 24, 25, 26
   }
}
}

March 14, 2022
in hdf5, c++
6 min read

Single-Thread Writer: Simplifying Parallel HDF5 I/O

HDF5 has a global lock: no matter how many threads you spawn, only one can execute HDF5 library calls at a time. If you naïvely let multiple threads hammer the library, you get serialization at best and deadlocks at worst.

The solution? One writer thread. All producers hand off data to it via a queue; it alone touches the HDF5 API.

Design Pattern

Producers (sensor readers, network handlers, simulators) run freely.
They push their data into a lock-free or bounded queue.
A single dedicated writer thread pops from the queue and performs all HDF5 calls (H5Dwrite, H5Dset_extent, etc.).

This way, the library never sees concurrent calls, and your application avoids global-lock contention.

Example Flow

```cpp // producers void producer(queue_t& q, int id) { for (int i = 0; i < 100; i++) { record_t rec{id, i}; q.push(rec); } }

// consumer/writer void writer(queue_t& q, h5::ds_t& ds) { record_t rec; while (q.pop(rec)) { h5::append(ds, rec); // all HDF5 I/O is serialized here } } ````

Thread Coordination

Producers run independently.
The writer drains the queue at its own pace.
When producers finish, they signal termination, and the writer flushes any remaining data before closing the file.

Benefits

✅ Correctness: no race conditions inside HDF5.
✅ Performance: eliminates global-lock thrashing.
✅ Simplicity: no need for per-thread file handles or MPI gymnastics.

In practice, a well-implemented queue keeps throughput high enough to saturate disk bandwidth. For bursty workloads, batching writes can further smooth performance.

When to Use

Multithreaded producers feeding a single HDF5 container.
Applications where correctness and predictability outweigh fine-grained parallel writes.
Prototypes that may later evolve into MPI-based distributed writers.

Takeaway

HDF5 isn’t thread-parallel—but your architecture can be. Push all I/O through a single writer thread, and let your other threads do what they do best: generate data without blocking.