The radix64.hpp header implements a highly compact encoding scheme for symbol names, designed to pack an 8-character IEX-style symbol and a 16-bit internal index into a single uint64_t. This representation enables high-performance lookups, low-memory overhead, and cache-friendly access patterns — making it ideal for real-time trading systems, contract mapping tables, or HDF5 serialization. By combining base-64 encoding with bit-packing, the system achieves fast, reversible, allocation-free symbol compression, without sacrificing flexibility or clarity.
Terminology. We use “radix-64” to mean a positional numeral system with base 64 over a chosen alphabet \(\Sigma\) (6-bit digits, big-endian). This is not RFC 4648 Base64 encoding of bytes.
Overview
Each IEX symbol is exactly 8 characters (right-padded to length with the minimum-code sentinel, e.g., space with code 0), drawn from a radix-64 alphabet (A–Z, 0–9, selected punctuation), and stored in a compact 64-bit key with the symbol packed big-endian into the upper 48 bits (8×6) and a 16-bit contract index in the lower 16 bits; this order-preserving, reversible layout is space-efficient (symbol + index in 8 bytes), enables fast lookup as a single uint64_t key in sorted vectors or flat hash maps, and supports one-pass decode without heap allocation—note that at most 8 radix-64 characters fit in 48 bits.
Radix 64 Symbol Encoding for IEX2H5
The IEX specification uses 8-byte padded ASCII strings (left-aligned, space-padded) to represent symbols (e.g. "AAPL "). Within the IEX protoicol (internally in the pcap format) they’re stored in uint64_t. To give you an example the memory Layout of "APPLE " (Little Endian)
is "APPLE␣␣␣" — 8 characters, padded with spaces. The resulting uint64_t value (in hex):0x202020454C505041
Bit Budget Analysis (48-bit) and radix64 Alphabet
We allocate 48 bits for symbol/string encoding, leaving 16 bits for indexing them. Let \(b\) be the number of bits per character. To store 8 characters in 48 bits we have \(b = \frac{48}{8} = 6.00\) available bits per character, similarly if we wanted to store 9 characters we have to squeeze each character into: \(b = \frac{48}{9} \approx 5.33\) bits. Since fractional bits would make our day misarable we have to settle with \( \left\lfloor 5.33 \right\rfloor = 5\). So it is a tossup between 5 or 6 bits.
Approaching it from the other side we empirically measured there are 34 different characters making up the IEX symbol alphabet. Since \( log_2(34) \approx 5.0874 \) and we strongly dislike fractional bits we need at least \( \left\lceil log_2(34) \right\rceil = 6\) bits to represent a single character. $$ \text{encoded} = \sum_{i=0}^{n-1} \text{alphabet_index}[i] \cdot 64^{n - 1 - i} $$ Conclusion: A base-64 alphabet lets us encode exactly 8 characters in 48 bits — with 30 extra characters to make it applicable for encoding tickers of other exchanges.
Compact radix64 Encoding of Ticker + Index Ordering Properties

When encoding ticker symbols using 6-bit characters (i.e. a base-64 alphabet), we concatenate the character codes to form a single base-64 number. Because the encoding alphabet is lexicographically ordered, this process preserves the relative order of the original strings. As a result, comparisons between encoded tickers can be performed using fast integer arithmetic instead of slower string-based comparisons — a key performance benefit for high-frequency symbol lookups.
The table on the right compares the original input order, decoded symbol, the index from original[i], and the lexicographically sorted symbol/index from sorted[i].
Correct lexicographic behavior is preserved when using the fixed radix64 encoding logic.
Binary Search in radix64-Encoded Flat Maps
The encode(symbol) function preserves lexical order, making it ideal for binary search over sorted containers. For example:
if (auto it = std::ranges::lower_bound(sorted, utils::radix64::encode("ocean ", 0)); it != sorted.end()) {
auto [symbol, index] = utils::radix64::decode(*it);
assert(symbol == "ocean ");
} else {
TRACE << "not found..." << std::endl;
}
This is a perfect match for fixed-width symbols, such as IEX 8-character tickers.
For trimmed symbols (e.g., "ocean" instead of "ocean "), a post-check is needed to backtrack if necessary:
if (auto it = std::ranges::lower_bound(sorted, utils::radix64::encode("ocean")); it != sorted.end()) {
// `it` may point to the next symbol (e.g., "planet") — backtrack and verify
auto [symbol, index] = utils::radix64::decode(*(--it));
if (symbol.starts_with("ocean"))
; // Found "ocean", possibly padded
else
TRACE << "not found..." << std::endl;
} else {
TRACE << "not found..." << std::endl;
}
Why “Radix-64” ?
Short version: I called it radix-64 because we are using a positional numeral system in base 64 (6-bit digits) over a custom, ordered alphabet—not the RFC 4648 Base64 byte codec.
- Definition fit. Each character \(c\in A\) is mapped via a monotone \(\sigma:A\to\{0,\dots,63\}\) and encoded
i.e., a base-64 radix expansion (big-endian 6-bit digits). That’s exactly “radix-64”.
* Avoids ambiguity. “Base64” typically means RFC 4648 (bytes→ASCII using A–Z a–z 0–9 + / with = padding). Radix64 scheme: custom alphabet, order-preserving, integer-comparables. Different thing → different name.
* Signals the key property. Saying “radix-64” primes the reader for order-preserving integer compares under the usual conditions: monotone \(\sigma\), fixed length or min-code padding, big-endian digits.
- Requirements: 6-bit chars, lex-order == unsigned-int order, fits in 48 bits.
- Model: treat chars as digits in base 64 with a monotone \(\sigma\).
- Constraints: fixed length or pad with smallest code; encode MSB-first.
- Verification: first differing index \(k\) ⇒ tail \(<64^{N-1-k}\) ⇒ \(x<_{\text{lex}}y \iff E(x)<E(y)\).
- Naming: picking radix64 describes (2) precisely what the encoding does, and isn’t overloaded by (RFC) connotations → radix-64 positional encoding.