A Week of Market History Vanished — Here’s What Really Happened
It started with a glitch. Buried deep in a mountain of tick data, a whole week suddenly collapsed into the twilight zone of December 31, 1969, 23:59:59 — one second before the Unix epoch. The ghost date. You get it when a timestamp is filled with all ones (0xFFFFFFFFFFFFFFFF
): a value that looks legitimate to the computer but really means “no timestamp at all.” At first I chalked it up as a curious terminal artifact, the kind of oddity you see once and forget. But then the implications sank in. This phantom week explained why my list of the 30 Most Consistently Traded Stocks on IEX mysteriously came up short — barely 800 trading days of history where there should have been many more.
To rewind a bit: Sander, George, and I go back over a decade, long before “AI” was splashed across every headline. Back then, stock data was either prohibitively expensive or painfully DIY, with traders like our friend Mike cobbling together custom recorders in C# just to peek at the order book. Fast forward to today: IEX generously streams its ticks to the world, and tools like IEX2H5 can compress terabytes into tidy HDF5 stores. So why bring up old friends? Because not long ago Sander showed up with a 4 TB hard drive in hand, asking me to copy over a slice of the 6 TB top-of-book PCAP dataset. I told him I could do better by repacking the PCAP frames into HDF5 chunks, which shrank it neatly to 2 TB — but this story isn’t about compression. It’s about spotting a streak of market events stamped in 1969, and how a late-night debugging session showed the culprit wasn’t a bug in my code at all, but an undocumented feature of the IEX feed itself. A subtle quirk, yes — but also a reminder that in high-frequency systems, nothing is ever as simple as it looks.
So when I started converting PCAP datasets into HDF5 chunks, I expected nothing more dramatic than a progress bar slowly marching forward. Instead, buried among terabytes of perfectly normal ticks, I stumbled on something strange — a whole week of market activity that seemed to vanish into thin air. That discovery set the stage for what turned into a late-night debugging session worthy of a detective novel.
At first, I thought it was my code. I tore apart the PCAP and PCAP NG parsers, double-checked my math, and even second-guessed whether I’d miscompiled with the wrong flags. But then came the smoking gun: Wireshark showed me that those packets weren’t malformed at all. They carried a timestamp field set to 0xffffffffffffffff
.
That was the lightbulb moment. This wasn’t a bug in my PCAP parser—it was an undocumented feature in the IEX feed itself: a sentinel value marking “invalid” packets. The kind of thing no spec sheet, no glossy API doc, and no vendor tutorial will ever tell you.
One line of code fixed it:
Simple enough—once you know what you’re looking for. But let’s be honest: it’s the kind of thing that’s easy to miss. Many would have written it off as a glitch in the HDF5 backend, a flaky ETL job, or even blamed the exchange. Meanwhile, those “clean” datasets would be quietly dropping days, nudging strategies off balance, and planting landmines that only go off weeks later in backtests or production. These are the kinds of ghosts you only learn to spot after years in the trenches—when you’ve seen enough undocumented quirks, corner cases, and phantom packets to know they’re always lurking just out of sight.
Forensic Debugging: When 0xFFFFFFFFFFFFFFFF Strikes
Step 1: The Phantom Week Appears
This is the point where the 1969 timestamps first appear in the output. With over 2,200 files in play, it’s easy to overlook the anomaly.
steven@saturn:~$ iex2h5 -c irts -o ~/output.h5 scratch/research/*.pcap
[iex2h5] Converting 9 files using backend: hdf5 — using 1 thread — © Varga Consulting, 2017–2025
[iex2h5] Visit https://vargaconsulting.github.io/iex2h5/ — Star it, Share it, Support Open Tools ⭐️
▫ 2024-11-07 14:30:00 21:00:00 ✓
▫ 2024-11-08 14:30:00 21:00:00 ✓
▫ 1969-12-31 23:59:59 ✓
▫ 1969-12-31 23:59:59 ✓
▪ 1969-12-31 23:59:59 ✓
▪ 1969-12-31 23:59:59 ✓
▪ 1969-12-31 23:59:59 ✓
▪ 1969-12-31 23:59:59 ✓
▫ 2024-11-18 14:30:00 21:00:00 ✓
▫ 2024-11-19 14:30:00 21:00:00 ✓
benchmark: 1357269997 events in 434891ms 3.1 Mticks/s, 0.320000 µs/tick latency, 286.28 GiB input converted into 9.37 GiB output
[iex2h5] Conversion complete — all files processed successfully
[iex2h5] Market data © IEX — Investors Exchange. Attribution required. See https://iextrading.com
steven@saturn:~$ ls -lh /lake/iex/tops/TOPS-2024-11-??.pcap.gz
-rw-rw-r-- 1 steven steven 7.5G Jan 19 2025 TOPS-2024-11-06.pcap.gz
-rw-rw-r-- 1 steven steven 6.1G Jan 19 2025 TOPS-2024-11-07.pcap.gz
-rw-rw-r-- 1 steven steven 5.6G Jan 19 2025 TOPS-2024-11-08.pcap.gz
-rw-rw-r-- 1 steven steven 6.8G Sep 8 23:59 TOPS-2024-11-11.pcap.gz <<
-rw-rw-r-- 1 steven steven 7.3G Sep 8 23:57 TOPS-2024-11-12.pcap.gz <<
-rw-rw-r-- 1 steven steven 8.1G Sep 8 23:54 TOPS-2024-11-13.pcap.gz <<
-rw-rw-r-- 1 steven steven 7.5G Sep 8 23:51 TOPS-2024-11-14.pcap.gz <<
-rw-rw-r-- 1 steven steven 7.9G Sep 8 23:49 TOPS-2024-11-15.pcap.gz <<
-rw-rw-r-- 1 steven steven 6.4G Jan 19 2025 TOPS-2024-11-18.pcap.gz
-rw-rw-r-- 1 steven steven 6.4G Jan 19 2025 TOPS-2024-11-19.pcap.gz
-rw-rw-r-- 1 steven steven 6.8G Jan 19 2025 TOPS-2024-11-20.pcap.gz
Step 2: A Trip Back to 1970
To make sure the parser wasn’t getting creative with timestamps, I resorted to the pinnacle of debugging science: printing values exactly where they matter
steven@saturn:~/projects/iex2h5/build$ ./iex2h5 -o ~/tmp.h5 -c none ~/scratch/research/sample_00000_20241111113746.pcap | slowcat --line 100
[iex2h5] Converting 1 file using backend: hdf5 — using 1 thread — © Varga Consulting, 2017–2025
[iex2h5] Visit https://vargaconsulting.github.io/iex2h5/ — Star it, Share it, Support Open Tools ⭐️
Packet timestamp: 1970-01-01 01:38:21 UTC
Packet timestamp: 1970-01-01 01:38:26 UTC
Packet timestamp: 1970-01-01 01:38:31 UTC
Packet timestamp: 1970-01-01 01:38:36 UTC
Packet timestamp: 1970-01-01 01:38:42 UTC
Packet timestamp: 1970-01-01 01:38:47 UTC
Packet timestamp: 1970-01-01 01:39:17 UTC
Packet timestamp: 1970-01-01 01:39:48 UTC
Packet timestamp: 1970-01-01 01:58:53 UTC
Packet timestamp: 1970-01-01 01:58:54 UTC
Packet timestamp: 1970-01-01 01:58:55 UTC
Packet timestamp: 1970-01-01 01:58:56 UTC
Packet timestamp: 1970-01-01 01:58:58 UTC
Step 3: Wireshark Cross-Examination
And lo and behold: tshark confirmed it wasn’t my imagination — the raw frames in the failing files kicked off with a string of ffffffffffffffff. A pattern nowhere to be found in the IEX spec, but clear enough to scream ‘here be undocumented features.
Step 4: IEX2H5 Confirms the Haunting
So I dove back into the IEX2H5 code and revisited the timestamp parsing with the most advanced debugging technique ever invented: printf. And what do you know — the mysterious ffffffffffffffffs were right there, just as the packets promised. Sometimes the old ways are still the best.
steven@saturn:~$ steven@saturn:~/projects/iex2h5/build$ ./iex2h5 -c none -o tmp.h5 ~/scratch/research/TOPS-2024-11-11.pcap | slowcat --line 100
[iex2h5] Converting 1 file using backend: hdf5 — using 1 thread — © Varga Consulting, 2017–2025
[iex2h5] Visit https://vargaconsulting.github.io/iex2h5/ — Star it, Share it, Support Open Tools ⭐�
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
1969-12-31 23:59:59.999999999 ffffffffffffffff
2024-11-11 11:57:41.637405547 1806e80fdf33db6b
2024-11-11 11:57:42.681472125 1806e8101d6f0c7d
2024-11-11 11:57:43.887588653 1806e8106552ed2d
2024-11-11 11:57:44.931556460 1806e810a38c9c6c
2024-11-11 11:57:46.137589497 1806e810eb6f36f9
2024-11-11 11:57:47.181654987 1806e81129aa63cb
2024-11-11 11:57:48.387590294 1806e811718b8096
2024-11-11 11:57:49.431736662 1806e811afc7e956
Step 5: Exorcising the Sentinel
And the fix? A $1000 single-liner: just skip ~0ULL as an undocumented sentinel. Of course, the code change is trivial — the real work was chasing the phantom through terabytes of ticks, hex dumps, and Wireshark traces
steven@saturn:~/projects/iex2h5/build$ git diff --staged
diff --git a/include/iex.hpp b/include/iex.hpp
index 3c8fba2..a138bde 100644
--- a/include/iex.hpp
+++ b/include/iex.hpp
@@ -392,7 +392,7 @@ namespace iex {
void transport_handler( const iex::transport::header* segment ){
using namespace std;
using namespace date;
-
+ if(segment->time == (~0ULL)) return; // 0xffffffffffffffffULL denotes invalid packet see issue #93
if( !count ) today = date::floor<date::days>( time_point(duration( segment->time) ) );
auto now = time_point(duration( segment->time) );
Step 6: Back to the Future — Data Restored
And here we go: the final run. The phantom 1969 timestamps are gone, everything lines up cleanly across files
steven@saturn:~/projects/iex2h5/build$ ./iex2h5 -c none -o ~/tmp.h5 ~/scratch/research/*.pcap
[iex2h5] Converting 11 files using backend: hdf5 — using 1 thread — © Varga Consulting, 2017–2025
[iex2h5] Visit https://vargaconsulting.github.io/iex2h5/ — Star it, Share it, Support Open Tools ⭐️
▫ 2024-11-07 14:30:00 21:00:00 ✓
▫ 2024-11-08 14:30:00 21:00:00 ✓
▫ 2024-11-11 14:30:00 21:00:00 ✓
▫ 2024-11-12 14:30:00 21:00:00 ✓
▫ 2024-11-13 14:30:00 21:00:00 ✓
▫ 2024-11-14 14:30:00 21:00:00 ✓
▫ 2024-11-15 14:30:00 21:00:00 ✓
▫ 2024-11-18 14:30:00 21:00:00 ✓
▫ 2024-11-19 14:30:00 21:00:00 ✓
benchmark: 3303175025 events in 256885ms 12.9 Mticks/s, 0.077000 µs/tick latency, 286.32 GiB input converted into 2.44 MiB output
[iex2h5] Conversion complete — all files processed successfully
[iex2h5] Market data © IEX — Investors Exchange. Attribution required. See https://iextrading.com
So here’s the moral of the story: If you’re serious about market data—if your trading desk, quant research, or risk analytics depends on every tick being correct—you don’t want to leave it to chance. You don’t want to find out six months from now that your backtests were running on phantom trades. Because in high-frequency trading, it’s never “just one bug.” It’s always the one you don’t see coming—the one that makes your million-dollar strategy look like it’s trading in 1969.
PCAP Parser
PCAP NG Parser
producer.hpp | |
---|---|
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
Explore the docs Download the MIT licensed project from GitHub