Libdecimal for C++ Quants: What are my options?
What every C++ developer knows: 0.1 + 0.2 != 0.3. No surprise there. The interesting question is which decimal format to use, what it costs you, and where Boost.Decimal stands among the options.
In October 2025, Harold Bott got in touch and asked me to review the dependency-free Boost.Decimal implementation. Unfortunately, I was not in a good position to help at the time: The Intel LIBBID-based libdecimal library I maintained was in terrible shape, and I did not yet have a proper benchmarking or comparison framework ready.
I felt a bit ashamed about that. I tried to scramble something together so I could respond with useful content, but their release deadline was only a few days away. It felt like a missed opportunity, and I kept thinking about it from time to time — you know, those moments between doom scrolling and being busy in general. This work is my attempt to make up for it.
Boost.Decimal is excellent work supports both Intel BID and IBM DPD formats, but this review is not to repeat the internal work of the library contributors, instead focuses on the corners I am familiar with writing exchange connectivities and trading systems.
1. Tools: Nanobench by Martin Leitner-Ankler a header only for Modern C++
The idea is simple: declare the types we care about, push them all through the same meat grinder, then measure the difference. It is not laboratory-perfect benchmarking, but it is close to the question production systems actually ask: given the same messy workload, which implementation behaves better?
Nanobench lived up to its promise: no Google-style complexity, just a small, fast library with cache warmup, useful diagnostics, and even a reminder to pin the process to a CPU to reduce jitter. If only it used snake_case and came with a built-in static_for construct. Nevertheless, I liked the library and happily recommend it!
- How to invoke
- How to Pin the CPU for stable numbers
- How to Change the measurement baseline
nanobench treats the first
bench.run()call in eachBenchinstance as the 100 % reference. To rebaseline on a different type, move itsbench.run()block to the top of the group in the correspondingbench/*.cpp, then rerun. Forconstruct-from-pair.cpp, which iterates astatic_for<tuple_t>, the first type in the tuple is the baseline:
2. Decimal Number Representations: What we actually compare here
-
Boost.Decimal
This is our baseline: a modern C++ implementation of IEEE 754 / ISO/IEC TR 24733 decimal floating-point types. The library is header-only, dependency-free, and requires C++14. Internally, BID-style decimal floating point is compact, but arithmetic and comparison are more involved than plain integer operations. The value of the library is that it hides this complexity behind a clean and standard-looking C++ interface. -
BID — libdecimal / Intel BID
This is also an IEEE 754 decimal floating-point implementation, based on the reworked Intel decimal library and wrapped with a modern C++ interface. Compared with the original Intel distribution, libdecimal adds cross-platform build work, 128-bit support, custom parsing, import/export helpers, and high-performance conversion routines. The storage model is the same broad family as above: decimal floating point using a BID-style representation. -
BCD — binary-coded decimal
BCD is the classic representation used in many legacy business systems. The idea is simple: each base-10 digit is stored in four bits, so two decimal digits can be packed into one byte. This is easy to inspect and faithful to decimal notation, but it carries a significant cost on modern binary hardware. The CPU is very good at binary integer arithmetic; BCD makes it put on gloves first. -
Scaled decimal — coefficient/exponent pair
The scaled representation stores a decimal as two separate fields: a significand and an exponent. In mathematical form: \(x = m \cdot 10^e\) This format appears often in internal trading systems and wire protocols, especially when different programming languages need to exchange decimal values without committing to one fixed scale. It is flexible, easy to serialize, and cheap to inspect, but addition and comparison require exponent alignment. -
Fixed-scale decimal — scaled integer with compile-time exponent
The fixed representation stores the value as a single integer significand, while the decimal exponent is tracked as a compile-time constant. In mathematical form:\(x = m \cdot 10^e\) but here \(e\) belongs to the type, not to the object. This is usually the fastest and simplest decimal representation when the scale is known in advance. It is well suited for fixed-scale financial values, columnar storage, market data, deterministic arithmetic, compact binary layouts, and schema-driven decimal fields. The downside is reduced generality. Fixed-scale decimal is excellent for addition, subtraction, comparison, and storage, but multiplication and division need explicit rescaling and rounding policy. BID-style decimal floating point is more robust for mixed-scale arithmetic and general-purpose decimal work.
3. Results: What certain operations cost you relative to Boost.Decimal
Construct from (significand, exponent)
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 3 934 | boost decimal32 — baseline |
| 76.5 % | 5 142 | float naive |
| 509 % | 772 | float optimised |
| 454 % | 866 | intel bid32 |
| 326 % | 1 208 | scaled uint32 |
| 75.2 % | 5 228 | bcd uint32 |
| 480 % | 820 | fixed32 checked |
| 484 % | 814 | fixed32 unchecked |
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 981 | boost decimal64 — baseline |
| 8.9 % | 11 047 | double naive |
| 130 % | 757 | double optimised |
| 106 % | 926 | intel bid64 |
| 89.4 % | 1 096 | scaled uint64 |
| 13.7 % | 7 132 | bcd uint64 |
| 137 % | 718 | fixed64 checked |
| 158 % | 621 | fixed64 unchecked |
Fixed-exponent types dominate at both widths. BCD construction is 7–12× slower than boost.
Decompose to (significand, exponent)
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 61 368 | boost decimal32 frexp10 — baseline |
| 671 % | 9 152 | float utils::decompose |
| 855 % | 7 181 | intel bid32 bid::decompose |
| 1781 % | 3 446 | scaled uint32 as_pair |
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 24 250 | boost decimal64 frexp10 — baseline |
| 265 % | 9 147 | double utils::decompose |
| 361 % | 6 727 | intel bid64 bid::decompose |
| 610 % | 3 974 | scaled int64 as_pair |
boost::decimal::frexp10 is the most expensive decompose path. scaled::as_pair() is a direct field read — 18× faster at 32-bit, 6× faster at 64-bit. Use libdecimal types for any path that must inspect the representation (serialization, wire encoding, logging).
Compare
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 60 017 | boost decimal64 — baseline |
| 385 % | 15 611 | double |
| 119 % | 50 563 | intel bid64 |
| 276 % | 21 771 | fixed64 -4 |
| 294 % | 20 431 | scaled int64 |
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 56 215 | boost decimal64 — baseline |
| 1240 % | 4 535 | double |
| 111 % | 50 457 | intel bid64 |
| 454 % | 12 370 | fixed64 -4 |
| 237 % | 23 721 | scaled int64 |
Boost comparison is the slowest decimal option. fixed64 -4 sorts 4.5× faster; boost comparison is nearly identical to intel BID64.
Arithmetic
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 190 245 | boost decimal64 — baseline |
| 2378 % | 8 002 | double |
| 373 % | 51 030 | intel bid64 |
| 14.9 % | 1 278 906 | bcd64 |
| 1886 % | 10 090 | fixed64 -4 |
| 632 % | 30 082 | scaled int64 |
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 5 457 | boost decimal64 — baseline |
| 3439 % | 159 | double |
| 264 % | 2 070 | intel bid64 |
| 4.6 % | 118 188 | bcd64 |
| 654 % | 834 | fixed64 -4 |
| 530 % | 1 030 | scaled int64 |
Boost arithmetic is the slowest decimal path. fixed64 is 19× faster for accumulation. BCD should not be used for arithmetic — it is 7–20× slower than boost.
String parse and format
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 44 106 | boost decimal64 — baseline |
| 315 % | 14 008 | intel bid64 |
| 21.5 % | 205 445 | bcd64 (digit-string + exp) |
| 94.7 % | 46 559 | stod (double reference) |
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 131 586 | boost decimal64 — baseline |
| 293 % | 44 877 | intel bid64 (std::format) |
| 238 % | 55 215 | bcd64 (.str()) |
| 827 % | 15 919 | scaled int64 (cast to string) |
| 66.3 % | 198 355 | double (to_string) |
Intel BID64 parses 3× faster than boost. For formatting, scaled's cast-to-string is 8× faster than boost's ostringstream <<.
Fee calculation — notional × rate
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 4 607 | boost decimal64 — baseline |
| 569 % | 810 | double |
| 124 % | 3 730 | intel bid64 |
| 176 % | 2 617 | fixed64 -4 |
| 273 % | 1 688 | scaled int64 |
Multiply is boost's least-bad category; intel is only 1.24× faster. Scaled is 2.7× faster.
Risk limit — accumulate + clamp (10 000 values)
| relative | ns/op | type |
|---|---|---|
| 100.0 % | 219 475 | boost decimal64 — baseline |
| 1370 % | 16 017 | double |
| 279 % | 78 644 | intel bid64 |
| 612 % | 35 873 | fixed64 -4 |
| 227 % | 96 909 | scaled int64 |
Under realistic mixed-op pressure (add + branch + assign per step), boost is still the slowest decimal type.
"Environment"
- CPU: 11th Gen Intel Core i7-11700K @ 3.60 GHz · OS: Linux 5.15 x86_64
- Build:
-O3 -march=native, C++23 - Baseline:
boost::decimal= 100 %; numbers above 100 % are faster than boost. - CPU frequency scaling was active (powersave governor + turbo); figures are indicative, not lab-grade.
Speedup relative to boost::decimal (>1× = faster). Intel Core i7-11700K, -O3 -march=native, C++23.
| type | construct | decompose | compare | accumulate | multiply | parse | format |
|---|---|---|---|---|---|---|---|
double |
1.3× | 2.7× | 3.9× | 23.8× | 5.7× | 0.95× | 0.66× |
intel bid64 |
1.1× | 3.6× | 1.2× | 3.7× | 1.2× | 3.2× | 2.9× |
fixed64 -4 |
1.6× | — | 2.8× | 18.9× | 1.8× | — | — |
scaled int64 |
0.9× | 6.1× | 2.9× | 6.3× | 2.7× | — | 8.3× |
bcd64 |
0.1× | — | — | 0.15× | 0.05× | 0.2× | 2.4× |
From the numbers, Boost.Decimal is on the money, but there is still room for improvement. To be fair, libdecimal by VargaLABS is not a copy-paste distribution of the original Intel library with a thin C++ layer glued on top. Besides the boring gymnastics required to make the Intel code work cleanly across platforms, libdecimal adds a custom parser, import/export support, and display functions, together with the slim and pleasant C++ syntax you may recognize from H5CPP.
Returning to Boost.Decimal: it is a good, modern, header-only implementation. If you already use Boost, it will not add significant friction to the pipeline, and it can definitely lower development cost.
| use case | recommended type | reason |
|---|---|---|
| Arithmetic — PnL, accumulation | fixed64<-4> |
7–19× faster; compile-time exponent eliminates runtime normalization |
| Multiply hot path — fees, notional | scaled<int64_t> |
2.7× faster; avoids forced fixed-scale rescaling on multiply |
| Compare / sort / rank | fixed64<-4> |
2.8–4.5× faster |
| Wire encode | scaled<int64_t> |
field read, near-free |
| Wire decode | fixed64<-4> |
3× faster; avoids per-value exponent normalization |
| String parse | intel bid64 |
3.2× faster than Boost.Decimal |
| String format | scaled<int64_t> |
8.3× faster when coefficient/exponent formatting is sufficient |
| Inspect internals / serialize | scaled<int64_t> |
6–18× faster via direct as_pair() access |
| General purpose / mixed scale | intel bid64 |
competitive across most operations, without a fixed-exponent constraint |
| Avoid for arithmetic hot paths | bcd64 |
7–20× slower than Boost.Decimal across arithmetic operations |
The takeaway: decimal floating point is powerful, but it is not something to spread over the whole system like Marmite on toast. Use it where mixed scale, interchange, or standards compatibility matter. For hot arithmetic paths, especially in trading-style workloads, fixed-scale and scaled-integer representations may perform better.