Skip to content

June 2022

Cross-language Messaging with ZeroMQ: C++, Python, and Fortran

🧩 Setup

We’ll use ZeroMQ’s PUSH/PULL pattern to build a minimal, language-agnostic messaging microservice.

Each sender (written in Python, C++, or Fortran) pushes 64-bit values to a single receiver.

šŸ“¦ Dependencies

Linux (Debian/Ubuntu):
sudo apt-get install libzmq3-dev
````

#### Python:

```bash
python3 -m pip install pyzmq
Fortran ZMQ bindings:

Install fzmq:

git clone https://github.com/richsnyder/fzmq && cd fzmq
mkdir build && cd build
cmake -DBUILD_DOCS=OFF ../
sudo make install

šŸ›°ļø Sender: Python

#!/usr/bin/python3
import zmq

ctx = zmq.Context()
sock = ctx.socket(zmq.PUSH)
sock.connect("tcp://localhost:5555")

for x in range(100):
    sock.send(x.to_bytes(8, 'little'))

Highlights:

  • Sends 8-byte integers in little-endian format
  • Easy for testing other receivers

šŸ›°ļø Sender: C++

#include <zmq.h>
#include <cstdint>
#include <cstring>

int main() {
    void* ctx = zmq_ctx_new();
    void* sock = zmq_socket(ctx, ZMQ_PUSH);
    zmq_connect(sock, "tcp://localhost:5555");

    for(uint64_t x = 0; x < 100; ++x)
        zmq_send(sock, &x, sizeof(x), 0);

    zmq_close(sock);
    zmq_ctx_term(ctx);
}

Highlights:

  • Sends raw uint64_t values (8 bytes)
  • Matches Python format

šŸ›°ļø Sender: Fortran

program send
  use fzmq
  implicit none

  type(zmq_context) :: context
  type(zmq_socket)  :: sock
  integer(kind=8)   :: x
  integer           :: rc

  context = zmq_ctx_new()
  sock = zmq_socket(context, ZMQ_PUSH)
  call zmq_connect(sock, "tcp://localhost:5555")

  do x = 0, 99
    call zmq_send(sock, x, 8, 0)
  end do

  call zmq_close(sock)
  call zmq_ctx_term(context)
end program send

Highlights:

  • Uses fzmq bindings
  • Sends binary 64-bit integers

šŸŽÆ Receiver: C++

#include <zmq.h>
#include <cstdint>
#include <cstdio>

int main() {
    void* ctx = zmq_ctx_new();
    void* sock = zmq_socket(ctx, ZMQ_PULL);
    zmq_bind(sock, "tcp://*:5555");

    for(uint64_t x; true;) {
        zmq_recv(sock, &x, sizeof(x), 0);
        printf("recv: %lu\n", x);
    }

    zmq_close(sock);
    zmq_ctx_term(ctx);
}

Highlights:

  • Pulls 64-bit integers from any connected sender
  • No language-specific deserialization required

āœ… Summary

ZeroMQ’s PUSH/PULL pattern makes multi-language IPC a breeze.

Role Language Notes
Sender Python Uses pyzmq, clean syntax
Sender C++ Raw zmq_send of binary integers
Sender Fortran Uses fzmq bindings
Receiver C++ Prints values from all senders

Run the receiver first, then launch any combination of senders. Messages will stream in regardless of language. No serialization frameworks, no boilerplate.

Automated Pretty Printing for STL-like Containers in H5CPP

šŸ› ļø Why not just print?

Debugging C++ is an art and often demands the setup of tools like valgrind, cachegrind, or gdb. These tools are powerful, but sometimes all you need is a quick look at what's inside your container.

Enter: H5CPP’s feature detection idiom-based pretty-printing system for STL-like containers.

Inspired by Walter Brown’s WG21 N4436 paper, we offer a mechanism that allows you to simply:

std::cout << stl_like_object;
````

Where `stl_like_object` can be any type that:

* provides `.begin()` and `.end()` (like vectors, lists, maps…)
* offers stack/queue-like interfaces (`.top()`, `.pop()`, `.size()`)
* or even composite/ragged containers like `vector<vector<T>>`, `tuple<...>`, etc.

## šŸ” What it does

The current implementation supports:

* **Recursive pretty-printing**
* **Line width control** (via `H5CPP_CONSOLE_WIDTH`)
* **In-line visualization** of arbitrarily nested structures

This will eventually replace/enhance the H5CPP persistence layer with general I/O capabilities.

---

## šŸ“¦ Example Output

Here's what `./pprint-test` prints to `stdout`:

```text
LISTS/VECTORS/SETS:
---------------------------------------------------------------------
   array<string,7>:[xSs,wc,gu,Ssi,Sx,pzb,OY]
            vector:[CDi,PUs,zpf,Hm,teO,XG,bu,QZs]
             deque:[256,233,23,89,128,268,69,278,130]
              list:[95,284,24,124,49,40,200,108,281,251,57, ...]
      forward_list:[147,76,81,193,44]
               set:[V,f,szy,v]
     unordered_set:[2.59,1.86,2.93,1.78,2.43,2.04,1.69]
          multiset:[3,5,12,21,23,28,30,30]
unordered_multiset:[gZ,rb,Dt,Q,Ark,dW,Ez,wmE,GwF]

And yes, it continues with:

  • Adaptors: stack, queue, priority_queue
  • Associative Containers: map, multimap, unordered_map, ...
  • Ragged Arrays: like vector<vector<string>>
  • Tuples and Pairs: including nested structures

Here’s a teaser for ragged and tuple structures:

vector<vector<string>>:[[pkwZZ,lBqsR,cmKt,PDjaS,Zj],[Nr,jj,xe,uC,bixzV],[uBAU,pXCa,fZEH,FIAIO]]
pair<int,string>:{4:upgdAdbvIB}
tuple<string,int,float,short>:<NHCmzhVVXJ,8,2.01756,7>

🧪 Run it yourself

g++ -I./include -o pprint-test.o -std=c++17 -DFMT_HEADER_ONLY -c pprint-test.cpp
g++ pprint-test.o -lhdf5 -lz -ldl -lm -o pprint-test
./pprint-test

You can set the line width by defining H5CPP_CONSOLE_WIDTH (e.g. -DH5CPP_CONSOLE_WIDTH=10).

šŸ“ Code snippet

#include <h5cpp/H5Uall.hpp>
#include <h5cpp/H5cout.hpp>
#include <utils/types.hpp>

std::cout << mock::data<vector<vector<string>>>::get(2, 5, 3, 7) << "\n";

šŸ”® What’s Next?

If you’d like to bring this up to the level of Julia’s pretty print system, get in touch! The data generators that support arbitrary C++ type generation are part of the larger h5rnd project — a Prüfer sequence-based HDF5 dataset generator.

HDF5 Group Overhead: Measuring the Cost of a Million Groups

šŸ“¦ The Problem

On [HDF5 forum][1], a user posed a question many developers eventually run into:

"We want to serialize an object tree into HDF5, where every object becomes a group. But group overhead seems large. Why is my 5 MB dataset now 120 MB?"

That’s a fair question. And we set out to quantify it.

āš™ļø Experimental Setup

We generated 1 million unique group names using h5::utils::get_test_data<std::string>(N), then created an HDF5 group for each name at the root level of an HDF5 file using H5CPP.

for (size_t n = 0; n < N; n++)
    h5::gr_t{H5Gcreate(root, names[n].data(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT)};

We measured:

  • Total HDF5 file size on disk
  • Total size of the strings used
  • Net overhead from metadata alone

šŸ“Š Results

Total file size      :  79,447,104 bytes (~77MB)
Total string payload :   1,749,862 bytes
---------------------------------------------
Net metadata overhead: ~77MB

Average per group    : ~776 bytes

Yep. Each group costs roughly 776 bytes, even when it contains no datasets or attributes.

šŸ“ˆ Visual Summary

Entry Count File Size Payload Size Overhead Avg/Group
1,000,000 77.94 MB 1.75 MB ~76.2 MB ~776 B

🧠 Why So Expensive?

HDF5 groups are not just simple folders—they are implemented using B-trees and heaps. Each group object has:

  • A header
  • Link messages
  • Heap storage for names
  • Possibly indexed storage for lookup

This structure scales well for access, but incurs overhead for each group created.

šŸ›  Can Compression Help?

No. Compression applies to datasets, not group metadata. Metadata (including group structures) is stored in an uncompressed format by default.

šŸ’” Recommendations

  • Avoid deep or wide group hierarchies with many small entries
  • If representing an object tree:

  • Consider flat structures with table-like metadata

  • Store object metadata as compound datasets or attributes
  • If you're tracking time-series or per-sample metadata:

  • Store as datasets with indexing, not groups

šŸ”š Final Thoughts

HDF5 is flexible—but that flexibility has a price when misapplied. Using groups to represent every atomic item or configuration object results in significant metadata bloat.

Use them judiciously. Use datasets liberally.

Custom Floating-Point and Opaque Types in HDF5

Extended precision floating-point (long double) is a common headache in data persistence. While HDF5 does support H5T_NATIVE_LDOUBLE, the inspection tools (h5dump) often misreport the stored numbers. Fortunately, H5CPP allows you to define custom datatypes—falling back to opaque storage when necessary.

Custom Type Definition

A specialization with H5T_OPAQUE lets you capture the raw 80-bit (or 128-bit) layout without worrying about architecture quirks:

namespace h5::impl::detail {
    template <>
    struct hid_t<opaque::ldouble_t, H5Tclose, true, true, hdf5::type>
        : public dt_p<opaque::ldouble_t> {
        using parent = dt_p<opaque::ldouble_t>;
        using parent::hid_t;
        using hidtype = opaque::ldouble_t;

        hid_t() : parent(H5Tcreate(H5T_OPAQUE, sizeof(opaque::ldouble_t))) {
            hid_t id = static_cast<hid_t>(*this);
        }
    };
}

This ensures your values are faithfully written and retrievable—even if the dumper chokes on them.

Example Output

A dataset written as H5T_NATIVE_LDOUBLE might display as garbage in h5dump:

DATASET "custom" {
   DATATYPE  H5T_NATIVE_LDOUBLE
   DATA {
   (0): 4.94066e-324, 4.94066e-324, ...
   }
}

…but the opaque fallback shows the raw byte patterns:

DATASET "opaque" {
   DATATYPE H5T_OPAQUE { OPAQUE_TAG "" }
   DATA {
   (0): 59:16:f5:f3:bb:e2:28:b8:01:40:00:00:00:00:00:00,
   (1): 21:93:2c:c5:cc:f5:5b:90:00:40:00:00:00:00:00:00,
   ...
   }
}

Why Two Views?

  • H5T_NATIVE_LDOUBLE: portable but misprinted by h5dump.
  • H5T_OPAQUE: exact bytes preserved, great for debugging or custom parsers.

On AMD64 systems, long double is stored in 16 bytes but only the first 10 bytes are significant. The last 6 are tail padding with undefined contents. This is why treating the type as opaque makes sense when fidelity is critical.

Beyond Long Double

You’re not limited to long double. With H5CPP you can adapt the same approach to:

  • half precision (float16)
  • nbit packed integers
  • arbitrary bit-level encodings

See the H5CPP examples for twobit, nbit, and half-float.

Takeaway

  • āœ… Use H5T_NATIVE_LDOUBLE when you want logical portability.
  • āœ… Wrap as OPAQUE when you need raw fidelity and control.
  • āš ļø Don’t panic when h5dump shows nonsense—the data is safe.

With H5CPP, you get the flexibility to represent any custom precision format—whether for simulation accuracy, bit-packed encodings, or raw experimental data.

HDF5 and long double: Precision Stored, Precision Misread

When working with scientific simulations, precision matters. Many codes rely on long double to squeeze out a few more digits of accuracy. The good news: HDF5 supports H5T_NATIVE_LDOUBLE natively, and with H5CPP you can write and read long double seamlessly.

The bad news: h5dump, the standard HDF5 inspection tool, stumbles. Instead of your carefully written values, you’ll often see tiny denormalized numbers (4.94066e-324) or other junk. This isn’t corruption—it’s just h5dump misinterpreting extended precision types.

Minimal Example

Consider the following snippet:

#include "h5cpp/all"
#include <vector>

int main() {
    std::vector<long double> x{0.0L, 0.01L, 0.02L, 0.03L, 0.04L,
                               0.05L, 0.06L, 0.07L, 0.08L, 0.09L};

    h5::fd_t fd = h5::create("test.h5", H5F_ACC_TRUNC);
    h5::ds_t ds = h5::create<long double>(fd, "homogenious", h5::current_dims{5,3}, h5::chunk{1,3});
    h5::write(ds, x);
}
Running the code and dumping with h5dump:

h5dump -d /homogenious test.h5

DATA {
(0,0): 4.94066e-324, 4.94066e-324, 4.94066e-324,
...
}

Looks broken, right? But if you read back the dataset with HDF5 or H5CPP:

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

…the values are correct. The underlying file is perfectly fine.

Why the Mismatch?

h5dump uses its own format string and assumes a particular binary layout for floating-point numbers. On many systems, long double is 80-bit extended precision or 128-bit quad precision, which doesn’t map cleanly to the dumper’s print logic. Hence the nonsense output.

In other words: the storage layer is solid, but the diagnostic tool lags behind.

Compound Types with long double

HDF5 compound types with H5T_NATIVE_LDOUBLE also work, including arrays of extended-precision fields:

DATASET "stream-of-records" {
   DATATYPE  H5T_COMPOUND {
      H5T_NATIVE_LDOUBLE "temp";
      H5T_NATIVE_LDOUBLE "density";
      H5T_ARRAY { [3] H5T_NATIVE_LDOUBLE } "B";
      H5T_ARRAY { [3] H5T_NATIVE_LDOUBLE } "V";
      ...
   }
   DATASPACE SIMPLE { ( 10 ) / ( H5S_UNLIMITED ) }
   STORAGE_LAYOUT { CHUNKED ( 512 ) }
   FILTERS { COMPRESSION DEFLATE { LEVEL 9 } }
}

Here too, h5dump shows garbage, but reading with HDF5 APIs returns the expected values.

Takeaway

  • āœ… Write long double safely with HDF5/H5CPP.
  • āœ… Read long double safely with HDF5/H5CPP.
  • āŒ Don’t trust h5dump for inspecting long double datasets.

Example to rewrite attributes

While it is not possible to append/extend attributes in HDF5, attributes often represent side band information with relatevily small size. In fact in previous HDF5 versions the attribute size was limited to 64kb, however Gerd Heber suggest this limitation has been lifted.

Haveing said the above it is a good strategy to break up append operation to read old dataset and write new dataset operations. The implementation is straightforward, and when used properly also is performant.

#include <vector>
#include <armadillo>
#include <h5cpp/all>

int main(void) {
    h5::fd_t fd = h5::create("h5cpp.h5",H5F_ACC_TRUNC);
    arma::mat data(10,5);

    { // 
    h5::ds_t ds = h5::write(fd,"some dataset", data);  // write dataset, and obtain descriptor
    h5::awrite(ds, "attribute name", {1,2,3,4,5,6,7});
    }
}

will give you the following layout

h5dump -a /some_dataset/attribute_name  h5cpp.h5
HDF5 "h5cpp.h5" {
ATTRIBUTE "attribute_name" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
   DATA {
   (0): 1, 2, 3, 4, 5, 6, 7
   }
}
}
To update the attribute you need to remove it first, since H5CPP doesn't yet do this automatically; in fact there is no h5::adelete either! -- however by design you can interchange HDF5 C API calls with H5CPP templates, so here is the update with H5Adelete and h5::awrite:

H5Adelete(ds,  "attribute name");
h5::awrite(ds, "attribute name", values);
h5dump -a /some_dataset/attribute_name  h5cpp.h5
HDF5 "h5cpp.h5" {
ATTRIBUTE "attribute_name" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 14 ) / ( 14 ) }
   DATA {
   (0): 1, 2, 3, 4, 5, 6, 7, 20, 21, 22, 23, 24, 25, 26
   }
}
}

Single-Thread Writer: Simplifying Parallel HDF5 I/O

HDF5 has a global lock: no matter how many threads you spawn, only one can execute HDF5 library calls at a time. If you naĆÆvely let multiple threads hammer the library, you get serialization at best and deadlocks at worst.

The solution? One writer thread. All producers hand off data to it via a queue; it alone touches the HDF5 API.

Design Pattern

  1. Producers (sensor readers, network handlers, simulators) run freely.
  2. They push their data into a lock-free or bounded queue.
  3. A single dedicated writer thread pops from the queue and performs all HDF5 calls (H5Dwrite, H5Dset_extent, etc.).

This way, the library never sees concurrent calls, and your application avoids global-lock contention.

Example Flow

```cpp // producers void producer(queue_t& q, int id) { for (int i = 0; i < 100; i++) { record_t rec{id, i}; q.push(rec); } }

// consumer/writer void writer(queue_t& q, h5::ds_t& ds) { record_t rec; while (q.pop(rec)) { h5::append(ds, rec); // all HDF5 I/O is serialized here } } ````

Thread Coordination

  • Producers run independently.
  • The writer drains the queue at its own pace.
  • When producers finish, they signal termination, and the writer flushes any remaining data before closing the file.

Benefits

  • āœ… Correctness: no race conditions inside HDF5.
  • āœ… Performance: eliminates global-lock thrashing.
  • āœ… Simplicity: no need for per-thread file handles or MPI gymnastics.

In practice, a well-implemented queue keeps throughput high enough to saturate disk bandwidth. For bursty workloads, batching writes can further smooth performance.

When to Use

  • Multithreaded producers feeding a single HDF5 container.
  • Applications where correctness and predictability outweigh fine-grained parallel writes.
  • Prototypes that may later evolve into MPI-based distributed writers.

Takeaway

HDF5 isn’t thread-parallel—but your architecture can be. Push all I/O through a single writer thread, and let your other threads do what they do best: generate data without blocking.