Radix64 Encoding: Order-Preserving Strings in 64 Bits
Problem: Compact, Order-Preserving Identifiers for Short Symbols
In domains like high-frequency trading, structured storage, or financial indexing, identifiers such as stock symbols (e.g., "AAPL"
, "GOOG"
, "TSLA"
) are:
- Short — typically ≤ 8 characters
- Drawn from a small alphabet — ~32 to 40 printable characters
- Compared lexicographically — e.g.,
"ABC" < "XYZ"
must hold
These structural constraints can be exploited to encode each symbol into a compact, fixed-width binary key while preserving their natural sort order.
Specifically, such systems often need:
- Lexicographic ordering — to maintain sorted maps, symbol trees, or search indices
- Fixed-size keys — for CPU- and cache-friendly performance
- Support for variable-length symbols — without breaking ordering
- Ability to pack extra metadata — like 16-bit contract IDs or timestamps
But typical representations fall short:
std::string
/char[]
are variable-length and inefficient as keysstrcmp
is slow, branching, and non-SIMD friendly- Standard Base64 isn't order-preserving
- Hashes destroy ordering and increase collision risk
➤ Radix64 encoding solves this by leveraging the symbol structure — short, constrained, ordered strings — to pack lex-order-preserving representations into just 64 bits.
Solution: Order-Preserving Radix64 Encoding
Radix64 solves this by:
- Encoding up to 8 characters as a 48-bit big-endian base-64 integer
- Padding shorter strings with the lowest code to preserve prefix order
- Optionally combining a 16-bit integer (e.g., contract ID) in the LSBs
- Making unsigned integer comparison exactly match lexicographic order
So you can now represent things like:
Used as keys in:
- Fast symbol lookup tables
- Order books and trading systems
- Encoded identifiers for datasets or HDF5 groups
- Key-value stores, B-trees, or radix trees