Custom Floating-Point and Opaque Types in HDF5
Extended precision floating-point (long double
) is a common headache in data persistence. While HDF5 does support H5T_NATIVE_LDOUBLE
, the inspection tools (h5dump
) often misreport the stored numbers. Fortunately, H5CPP allows you to define custom datatypes—falling back to opaque storage when necessary.
Custom Type Definition
A specialization with H5T_OPAQUE
lets you capture the raw 80-bit (or 128-bit) layout without worrying about architecture quirks:
namespace h5::impl::detail {
template <>
struct hid_t<opaque::ldouble_t, H5Tclose, true, true, hdf5::type>
: public dt_p<opaque::ldouble_t> {
using parent = dt_p<opaque::ldouble_t>;
using parent::hid_t;
using hidtype = opaque::ldouble_t;
hid_t() : parent(H5Tcreate(H5T_OPAQUE, sizeof(opaque::ldouble_t))) {
hid_t id = static_cast<hid_t>(*this);
}
};
}
This ensures your values are faithfully written and retrievable—even if the dumper chokes on them.
Example Output
A dataset written as H5T_NATIVE_LDOUBLE
might display as garbage in h5dump
:
…but the opaque fallback shows the raw byte patterns:
DATASET "opaque" {
DATATYPE H5T_OPAQUE { OPAQUE_TAG "" }
DATA {
(0): 59:16:f5:f3:bb:e2:28:b8:01:40:00:00:00:00:00:00,
(1): 21:93:2c:c5:cc:f5:5b:90:00:40:00:00:00:00:00:00,
...
}
}
Why Two Views?
H5T_NATIVE_LDOUBLE
: portable but misprinted byh5dump
.H5T_OPAQUE
: exact bytes preserved, great for debugging or custom parsers.
On AMD64 systems, long double
is stored in 16 bytes but only the first 10 bytes are significant. The last 6 are tail padding with undefined contents. This is why treating the type as opaque makes sense when fidelity is critical.
Beyond Long Double
You’re not limited to long double
. With H5CPP you can adapt the same approach to:
half
precision (float16
)nbit
packed integers- arbitrary bit-level encodings
See the H5CPP examples for twobit
, nbit
, and half-float
.
Takeaway
- ✅ Use
H5T_NATIVE_LDOUBLE
when you want logical portability. - ✅ Wrap as
OPAQUE
when you need raw fidelity and control. - ⚠️ Don’t panic when
h5dump
shows nonsense—the data is safe.
With H5CPP, you get the flexibility to represent any custom precision format—whether for simulation accuracy, bit-packed encodings, or raw experimental data.