Skip to content

postmortem

A Week of Market History Vanished — Here’s What Really Happened

It started with a glitch. Buried deep in a mountain of tick data, a whole week suddenly collapsed into the twilight zone of December 31, 1969, 23:59:59 — one second before the Unix epoch. The ghost date. You get it when a timestamp is filled with all ones (0xFFFFFFFFFFFFFFFF): a value that looks legitimate to the computer but really means “no timestamp at all.” At first I chalked it up as a curious terminal artifact, the kind of oddity you see once and forget. But then the implications sank in. This phantom week explained why my list of the 30 Most Consistently Traded Stocks on IEX mysteriously came up short — barely 800 trading days of history where there should have been many more.

To rewind a bit: Sander, George, and I go back over a decade, long before “AI” was splashed across every headline. Back then, stock data was either prohibitively expensive or painfully DIY, with traders like our friend Mike cobbling together custom recorders in C# just to peek at the order book. Fast forward to today: IEX generously streams its ticks to the world, and tools like IEX2H5 can compress terabytes into tidy HDF5 stores. So why bring up old friends? Because not long ago Sander showed up with a 4 TB hard drive in hand, asking me to copy over a slice of the 6 TB top-of-book PCAP dataset. I told him I could do better by repacking the PCAP frames into HDF5 chunks, which shrank it neatly to 2 TB — but this story isn’t about compression. It’s about spotting a streak of market events stamped in 1969, and how a late-night debugging session showed the culprit wasn’t a bug in my code at all, but an undocumented feature of the IEX feed itself. A subtle quirk, yes — but also a reminder that in high-frequency systems, nothing is ever as simple as it looks.


So when I started converting PCAP datasets into HDF5 chunks, I expected nothing more dramatic than a progress bar slowly marching forward. Instead, buried among terabytes of perfectly normal ticks, I stumbled on something strange — a whole week of market activity that seemed to vanish into thin air. That discovery set the stage for what turned into a late-night debugging session worthy of a detective novel. At first, I thought it was my code. I tore apart the PCAP and PCAP NG parsers, double-checked my math, and even second-guessed whether I’d miscompiled with the wrong flags. But then came the smoking gun: Wireshark showed me that those packets weren’t malformed at all. They carried a timestamp field set to 0xffffffffffffffff.

That was the lightbulb moment. This wasn’t a bug in my PCAP parser—it was an undocumented feature in the IEX feed itself: a sentinel value marking “invalid” packets. The kind of thing no spec sheet, no glossy API doc, and no vendor tutorial will ever tell you.

One line of code fixed it:

iex.hpp
template <typename consumer_t> struct transport_t {
    [...]
    void transport_handler( const iex::transport::header* segment ){
        if(segment->time == (~0ULL)) return; // 0xffffffffffffffffULL denotes invalid packet see issue #93
        if( !count ) today = date::floor<date::days>( time_point(duration( segment->time) ) );
        auto now = time_point(duration( segment->time) );
        // trigger opening market event
        if( now > today + this->start && !is_market_opened )
            [...]
        if( is_market_opened && !is_market_closed) [...]
        count++;
    }
    void end() {
        if (this->is_market_opened && !this->is_market_closed) [...]
    }
    [...]
    long count=0;                  /*!< Number of transport segments processed */
}

Simple enough—once you know what you’re looking for. But let’s be honest: it’s the kind of thing that’s easy to miss. Many would have written it off as a glitch in the HDF5 backend, a flaky ETL job, or even blamed the exchange. Meanwhile, those “clean” datasets would be quietly dropping days, nudging strategies off balance, and planting landmines that only go off weeks later in backtests or production. These are the kinds of ghosts you only learn to spot after years in the trenches—when you’ve seen enough undocumented quirks, corner cases, and phantom packets to know they’re always lurking just out of sight.

Forensic Debugging: When 0xFFFFFFFFFFFFFFFF Strikes
Step 1: The Phantom Week Appears

This is the point where the 1969 timestamps first appear in the output. With over 2,200 files in play, it’s easy to overlook the anomaly.

steven@saturn:~$ iex2h5 -c irts -o ~/output.h5 scratch/research/*.pcap
[iex2h5] Converting 9 files using backend: hdf5  using 1 thread  © Varga Consulting, 2017–2025
[iex2h5] Visit https://vargaconsulting.github.io/iex2h5/  Star it, Share it, Support Open Tools ⭐️
 2024-11-07 14:30:00 21:00:00  2024-11-08 14:30:00 21:00:00  1969-12-31 23:59:59  1969-12-31 23:59:59  1969-12-31 23:59:59  1969-12-31 23:59:59  1969-12-31 23:59:59  1969-12-31 23:59:59  2024-11-18 14:30:00 21:00:00  2024-11-19 14:30:00 21:00:00 benchmark: 1357269997 events in 434891ms  3.1Mticks/s, 0.320000µs/tick latency, 286.28 GiB input converted into 9.37 GiB output
[iex2h5] Conversion complete  all files processed successfully 
[iex2h5] Market data © IEX  Investors Exchange. Attribution required. See https://iextrading.com
steven@saturn:~$ ls -lh /lake/iex/tops/TOPS-2024-11-??.pcap.gz
-rw-rw-r-- 1 steven steven 7.5G Jan 19  2025 TOPS-2024-11-06.pcap.gz
-rw-rw-r-- 1 steven steven 6.1G Jan 19  2025 TOPS-2024-11-07.pcap.gz
-rw-rw-r-- 1 steven steven 5.6G Jan 19  2025 TOPS-2024-11-08.pcap.gz
-rw-rw-r-- 1 steven steven 6.8G Sep  8 23:59 TOPS-2024-11-11.pcap.gz << 
-rw-rw-r-- 1 steven steven 7.3G Sep  8 23:57 TOPS-2024-11-12.pcap.gz <<
-rw-rw-r-- 1 steven steven 8.1G Sep  8 23:54 TOPS-2024-11-13.pcap.gz << 
-rw-rw-r-- 1 steven steven 7.5G Sep  8 23:51 TOPS-2024-11-14.pcap.gz << 
-rw-rw-r-- 1 steven steven 7.9G Sep  8 23:49 TOPS-2024-11-15.pcap.gz <<
-rw-rw-r-- 1 steven steven 6.4G Jan 19  2025 TOPS-2024-11-18.pcap.gz
-rw-rw-r-- 1 steven steven 6.4G Jan 19  2025 TOPS-2024-11-19.pcap.gz
-rw-rw-r-- 1 steven steven 6.8G Jan 19  2025 TOPS-2024-11-20.pcap.gz

Step 2: A Trip Back to 1970

To make sure the parser wasn’t getting creative with timestamps, I resorted to the pinnacle of debugging science: printing values exactly where they matter

steven@saturn:~/projects/iex2h5/build$ ./iex2h5 -o ~/tmp.h5 -c none ~/scratch/research/sample_00000_20241111113746.pcap | slowcat --line 100
[iex2h5] Converting 1 file using backend: hdf5  using 1 thread  © Varga Consulting, 2017–2025
[iex2h5] Visit https://vargaconsulting.github.io/iex2h5/  Star it, Share it, Support Open Tools ⭐️
Packet timestamp: 1970-01-01 01:38:21 UTC
Packet timestamp: 1970-01-01 01:38:26 UTC
Packet timestamp: 1970-01-01 01:38:31 UTC
Packet timestamp: 1970-01-01 01:38:36 UTC
Packet timestamp: 1970-01-01 01:38:42 UTC
Packet timestamp: 1970-01-01 01:38:47 UTC
Packet timestamp: 1970-01-01 01:39:17 UTC
Packet timestamp: 1970-01-01 01:39:48 UTC
Packet timestamp: 1970-01-01 01:58:53 UTC
Packet timestamp: 1970-01-01 01:58:54 UTC
Packet timestamp: 1970-01-01 01:58:55 UTC
Packet timestamp: 1970-01-01 01:58:56 UTC
Packet timestamp: 1970-01-01 01:58:58 UTC

Step 3: Wireshark Cross-Examination

And lo and behold: tshark confirmed it wasn’t my imagination — the raw frames in the failing files kicked off with a string of ffffffffffffffff. A pattern nowhere to be found in the IEX spec, but clear enough to scream ‘here be undocumented features.

tshark dump
steven@saturn:~$ tshark -r scratch/research/TOPS-2024-11-08.pcap -T fields -e frame.time_epoch -e data -c 20 | slowcat --line 200 --truncate=101
1731069116.206792000    01000380010000000000434e0000000000000000000000000100000000000000ab1d1e8930fe0518
1731069117.188933000    01000380010000000000434e9705380000000000000000000100000000000000ace6d2bd30fe0518
1731069117.191213000    01000380010000000000434e8b05370097050000000000003900000000000000fb52d3bd30fe0518
1731069117.193291000    01000380010000000000434e8b053700220b0000000000007000000000000000e0add3bd30fe0518
1731069117.194838000    01000380010000000000434e0a053200ad10000000000000a7000000000000003f02d4bd30fe0518
1731069117.196675000    01000380010000000000434e8b053700b715000000000000d9000000000000003e4fd4bd30fe0518
1731069117.198506000    01000380010000000000434e8b053700421b000000000000100100000000000004b7d4bd30fe0518
1731069117.199851000    01000380010000000000434e8b053700cd2000000000000047010000000000007f0fd5bd30fe0518
1731069117.201351000    01000380010000000000434e8b05370058260000000000007e01000000000000366dd5bd30fe0518
1731069117.202738000    01000380010000000000434e8b053700e32b000000000000b5010000000000004ad1d5bd30fe0518
1731069117.204130000    01000380010000000000434e8b0537006e31000000000000ec010000000000004f38d6bd30fe0518
1731069117.205377000    01000380010000000000434e8b053700f93600000000000023020000000000009785d6bd30fe0518
1731069117.206583000    01000380010000000000434e8b053700843c0000000000005a020000000000009dedd6bd30fe0518
1731069117.207852000    01000380010000000000434e8b0537000f420000000000009102000000000000ad55d7bd30fe0518
1731069117.209046000    01000380010000000000434e8b0537009a47000000000000c802000000000000f7bbd7bd30fe0518
1731069117.210103000    01000380010000000000434e8b053700254d000000000000ff02000000000000050ed8bd30fe0518
1731069117.211103000    01000380010000000000434e8b053700b0520000000000003603000000000000eb6dd8bd30fe0518
1731069117.211960000    01000380010000000000434e8b0537003b580000000000006d0300000000000016cfd8bd30fe0518
1731069117.212890000    01000380010000000000434e8b053700c65d000000000000a4030000000000006024d9bd30fe0518
1731069117.214016000    01000380010000000000434e8b0537005163000000000000db03000000000000bda1d9bd30fe0518

steven@saturn:~$ tshark -r scratch/research/TOPS-2024-11-11.pcap -T fields -e frame.time_epoch -e data -c 20 | slowcat --line 200 --truncate=101
1731325066.340402000    0100038001000000000000000000000000000000000000000100000000000000ffffffffffffffff
1731325071.584101000    0100038001000000000000000000000000000000000000000100000000000000ffffffffffffffff
1731325076.717536000    0100038001000000000000000000000000000000000000000100000000000000ffffffffffffffff
1731325081.950439000    0100038001000000000000000000000000000000000000000100000000000000ffffffffffffffff
1731325087.283595000    0100038001000000000000000000000000000000000000000100000000000000ffffffffffffffff
1731325092.516808000    0100038001000000000000000000000000000000000000000100000000000000ffffffffffffffff
1731325122.849754000    0100038001000000000000000000000000000000000000000100000000000000ffffffffffffffff
1731325153.083934000    0100038001000000000000000000000000000000000000000100000000000000ffffffffffffffff
1731326298.667396000    01000380010000000000464e00000000000000000000000001000000000000006bdb33df0fe80618
1731326299.681534000    01000380010000000000464e00000000000000000000000001000000000000007d0c6f1d10e80618
1731326300.887641000    01000380010000000000464e00000000000000000000000001000000000000002ded526510e80618
1731326301.931621000    01000380010000000000464e00000000000000000000000001000000000000006c9c8ca310e80618
1731326303.137630000    01000380010000000000464e0000000000000000000000000100000000000000f9366feb10e80618
1731326304.181700000    01000380010000000000464e0000000000000000000000000100000000000000cb63aa2911e80618
1731326305.387631000    01000380010000000000464e000000000000000000000000010000000000000096808b7111e80618
1731326306.431776000    01000380010000000000464e000000000000000000000000010000000000000056e9c7af11e80618
1731326307.431820000    01000380010000000000464e0000000000000000000000000100000000000000cc2a63eb11e80618
1731326308.431842000    01000380010000000000464e00000000000000000000000001000000000000003f92fe2612e80618
1731326309.431877000    01000380010000000000464e000000000000000000000000010000000000000044cf996212e80618
1731326310.431900000    01000380010000000000464e00000000000000000000000001000000000000006dff349e12e80618

Step 4: IEX2H5 Confirms the Haunting

So I dove back into the IEX2H5 code and revisited the timestamp parsing with the most advanced debugging technique ever invented: printf. And what do you know — the mysterious ffffffffffffffffs were right there, just as the packets promised. Sometimes the old ways are still the best.

steven@saturn:~$ steven@saturn:~/projects/iex2h5/build$ ./iex2h5 -c none -o tmp.h5 ~/scratch/research/TOPS-2024-11-11.pcap | slowcat --line 100
[iex2h5] Converting 1 file using backend: hdf5  using 1 thread  © Varga Consulting, 2017–2025
[iex2h5] Visit https://vargaconsulting.github.io/iex2h5/  Star it, Share it, Support Open Tools ⭐�
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
2024-11-11 11:57:41.637405547 1806e80fdf33db6b
2024-11-11 11:57:42.681472125 1806e8101d6f0c7d
2024-11-11 11:57:43.887588653 1806e8106552ed2d
2024-11-11 11:57:44.931556460 1806e810a38c9c6c
2024-11-11 11:57:46.137589497 1806e810eb6f36f9
2024-11-11 11:57:47.181654987 1806e81129aa63cb
2024-11-11 11:57:48.387590294 1806e811718b8096
2024-11-11 11:57:49.431736662 1806e811afc7e956

Step 5: Exorcising the Sentinel

And the fix? A $1000 single-liner: just skip ~0ULL as an undocumented sentinel. Of course, the code change is trivial — the real work was chasing the phantom through terabytes of ticks, hex dumps, and Wireshark traces

steven@saturn:~/projects/iex2h5/build$ git diff --staged
diff --git a/include/iex.hpp b/include/iex.hpp
index 3c8fba2..a138bde 100644
--- a/include/iex.hpp
+++ b/include/iex.hpp
@@ -392,7 +392,7 @@ namespace iex {
                void transport_handler( const iex::transport::header* segment ){
                        using namespace std;
                        using namespace date;
-               
+                       if(segment->time ==  (~0ULL)) return; // 0xffffffffffffffffULL denotes invalid packet see issue #93
                        if( !count ) today = date::floor<date::days>( time_point(duration( segment->time) ) );
                        auto now = time_point(duration( segment->time) );

Step 6: Back to the Future — Data Restored

And here we go: the final run. The phantom 1969 timestamps are gone, everything lines up cleanly across files

steven@saturn:~/projects/iex2h5/build$ ./iex2h5 -c none -o ~/tmp.h5 ~/scratch/research/*.pcap
[iex2h5] Converting 11 files using backend: hdf5  using 1 thread  © Varga Consulting, 2017–2025
[iex2h5] Visit https://vargaconsulting.github.io/iex2h5/  Star it, Share it, Support Open Tools ⭐️
 2024-11-07 14:30:00 21:00:00  2024-11-08 14:30:00 21:00:00  2024-11-11 14:30:00 21:00:00  2024-11-12 14:30:00 21:00:00  2024-11-13 14:30:00 21:00:00  2024-11-14 14:30:00 21:00:00  2024-11-15 14:30:00 21:00:00  2024-11-18 14:30:00 21:00:00  2024-11-19 14:30:00 21:00:00 benchmark: 3303175025 events in 256885ms  12.9Mticks/s, 0.077000µs/tick latency, 286.32 GiB input converted into 2.44 MiB output
[iex2h5] Conversion complete  all files processed successfully 
[iex2h5] Market data © IEX  Investors Exchange. Attribution required. See https://iextrading.com

So here’s the moral of the story: If you’re serious about market data—if your trading desk, quant research, or risk analytics depends on every tick being correct—you don’t want to leave it to chance. You don’t want to find out six months from now that your backtests were running on phantom trades. Because in high-frequency trading, it’s never “just one bug.” It’s always the one you don’t see coming—the one that makes your million-dollar strategy look like it’s trading in 1969.

PCAP Parser

producer.hpp
namespace iex::pcap {
    struct global_header_t {
        uint32_t magic_number;     /*!< Magic number used to detect byte order and timestamp resolution */
        uint16_t version_major;    /*!< Major version number (typically 2) */
        uint16_t version_minor;    /*!< Minor version number (typically 4) */
        int32_t thiszone;          /*!< GMT to local time correction (usually zero) */
        uint32_t sigfigs;          /*!< Accuracy of timestamps (not used) */
        uint32_t snaplen;          /*!< Max length of captured packets, in octets */
        uint32_t network;          /*!< Data link type (1 = Ethernet) */
    } __attribute__((packed));

    struct packet_header_t {
        uint32_t ts;        /**< Timestamp: seconds since Unix epoch */
        uint32_t ns;        /**< Timestamp: sub-second precision (micro or nanoseconds) */
        uint32_t captured;  /**< Number of bytes actually captured (≤ snaplen) */
        uint32_t original;  /**< Original length of the packet on the wire */
    } __attribute__((packed));

    template <class stream, class consumer>
    struct producer_t : public base::producer_t<stream, consumer> {
        using parent = base::producer_t<stream, consumer>;
        using duration = typename consumer::duration;
        using parent::needs_byte_swap, parent::read_exact, parent::buffer, parent::is_little_endian, parent::packet_count,
            parent::link_type, parent::version_major, parent::version_minor, parent::snap_length, parent::check_compatibility;

        explicit producer_t(FILE* fd, duration hb) : parent(fd, hb) {
            read_exact(reinterpret_cast<uint8_t*>(&global_header), sizeof(global_header));
            if (!utils::pcap::is_valid_magic(global_header.magic_number))
                THROW_RUNTIME_ERROR("Invalid PCAP magic number: " + std::to_string(global_header.magic_number));

            this->needs_byte_swap = utils::pcap::needs_byteswap(global_header.magic_number);
            if (this->needs_byte_swap) {
                global_header.version_major = std::byteswap(global_header.version_major);
                global_header.version_minor = std::byteswap(global_header.version_minor);
                global_header.thiszone      = std::byteswap(global_header.thiszone);
                global_header.sigfigs       = std::byteswap(global_header.sigfigs);
                global_header.snaplen       = std::byteswap(global_header.snaplen);
                global_header.network       = std::byteswap(global_header.network);
            }

            link_type = static_cast<utils::pcap::link_type>(global_header.network);
            version_major = global_header.version_major, version_minor = global_header.version_minor, 
            snap_length = global_header.snaplen, is_little_endian = utils::pcap::is_little_endian(global_header.magic_number);
            check_compatibility("pcap");
        }

        void run_impl() {
            while (read_exact(reinterpret_cast<uint8_t*>(&packet_header), sizeof(packet_header))) {
                if (packet_header.captured > buffer.size())
                    THROW_RUNTIME_ERROR(
                        "packet too large: " + std::to_string(packet_header.captured) +
                        " buffer: " + std::to_string(buffer.size()));

                if (!read_exact(buffer.data(), packet_header.captured))
                    break;  // EOF

                const iex::transport::header* segment = reinterpret_cast<const iex::transport::header*>(
                    buffer.data() + sizeof(iex::base::packet));
                this->transport_handler(segment);
                packet_count++;
            }
            this->end();
        }

        global_header_t global_header{};     /*!< parsed PCAP global header */
        packet_header_t packet_header{};     /*!< current PCAP packet header */
    };
}  // namespace iex::pcap

PCAP NG Parser

producer.hpp
namespace iex::pcapng {
    enum class block_type : uint32_t {
        SECTION_HEADER        = 0x0A0D0D0A, //!< Section Header Block (SHB)
        INTERFACE_DESCRIPTION = 0x00000001, //!< Interface Description Block (IDB)
        PACKET                = 0x00000002, //!< Obsolete: Simple Packet Block (SPB)
        NAME_RESOLUTION       = 0x00000004, //!< Name Resolution Block (NRB)
        INTERFACE_STATS       = 0x00000005, //!< Interface Statistics Block (ISB)
        ENHANCED_PACKET       = 0x00000006, //!< Enhanced Packet Block (EPB)
        UNKNOWN               = 0xFFFFFFFF  //!< Fallback or invalid block type
    };
    struct block_header_t {
        uint32_t block_type;
        uint32_t block_total_length;
    } __attribute__((packed));

    struct shb_t {
        uint32_t byte_order_magic;
        uint16_t version_major;
        uint16_t version_minor;
        int64_t  section_length;
    } __attribute__((packed));

    struct idb_t {
        uint16_t link_type;
        uint16_t reserved;
        uint32_t snaplen;
    } __attribute__((packed));

    struct epb_t {
        uint32_t interface_id;
        uint32_t ts_high;
        uint32_t ts_low;
        uint32_t captured_len;
        uint32_t original_len;
    } __attribute__((packed));

    template <class stream, class consumer>
    struct producer_t : public base::producer_t<stream, consumer> {
        using parent = base::producer_t<stream, consumer>;
        using duration = typename consumer::duration;
        using parent::needs_byte_swap, parent::read_exact, parent::buffer, parent::is_little_endian, parent::packet_count,
            parent::link_type, parent::version_major, parent::version_minor, parent::snap_length, parent::check_compatibility;

        explicit producer_t(FILE* fd, duration hb) : parent(fd, hb) {
        }

        void run_impl() {
            while (true) {
                if (!this->read_exact(reinterpret_cast<uint8_t*>(&hdr), sizeof(hdr))) break;
                if (!this->read_exact(buffer.data(), hdr.block_total_length - sizeof(hdr)))
                    THROW_RUNTIME_ERROR("Failed to read complete block body");

                switch(static_cast<block_type>(hdr.block_type)) {
                    case block_type::SECTION_HEADER:  // already verifies `magic`
                        shb = reinterpret_cast<shb_t*>(buffer.data());
                        needs_byte_swap = utils::pcapng::needs_byteswap(shb->byte_order_magic);
                        version_major = shb->version_major, version_minor = shb->version_minor;
                        is_little_endian = utils::pcapng::is_little_endian(shb->byte_order_magic);
                    break;
                    case block_type::INTERFACE_DESCRIPTION:
                        idb = reinterpret_cast<idb_t*>(buffer.data()), snap_length = idb->snaplen,
                        link_type = static_cast<utils::pcap::link_type>(idb->link_type);
                        check_compatibility("pcap-ng");
                    break;
                    case block_type::ENHANCED_PACKET: {
                        const epb_t* epb = reinterpret_cast<const epb_t*>(buffer.data());
                        if(epb->captured_len != epb->original_len)
                            TRACE << epb->captured_len << " " << epb->original_len << std::endl;
                        const iex::transport::header* segment = reinterpret_cast<const iex::transport::header*>(
                            buffer.data() + sizeof(epb_t) + sizeof(iex::base::packet));
                        this->transport_handler(segment);
                        break;
                    }
                    default: ;
                }
            }
            this->end();
        }

        uint32_t trailing_length = 0, trailer = 0;
        block_header_t hdr;
        shb_t* shb;
        idb_t* idb;
    };
} // namespace iex::pcapng

Explore the docs Download the MIT licensed project from GitHub