IEX2H5: From Raw Packets to Research-Ready Prices
Project Goal IEX2H5 transforms raw IEX exchange data—captured as PCAP files—into clean, compressed, and query-efficient HDF5 arrays. It bridges the gap between archival tick data and quantitative strategy development, making large-scale time-series analysis as simple as invoking a single command.
The Problem Tick-level market data is voluminous, noisy, and stored in low-level binary formats. Traditional ingestion pipelines are:
- Slow: parsing millions of messages per second is non-trivial.
- Heavy: inefficient storage formats result in bloated datasets.
- Fragmented: transforming PCAPs into research-grade datasets typically requires a pipeline of tools with fragile glue code.
The Solution IEX2H5 implements a minimal-dependency, high-performance C++ application that:
- Understands IEX: parses native IEX DEEP/TOPS protocols at the packet level.
- Stores efficiently: compresses billions of events into structured, schema-aware HDF5 datasets.
- Resamples on the fly: extracts OHLCV bars, real-time snapshots, and trade stats—directly during import.
- Performs: demonstrated ingest speed of 65M events/sec, reducing 40 GiB of raw data into <600 MiB.
- Supports reproducibility: deterministic, portable
.h5
output files with nanosecond precision timestamps.
Why It Matters IEX2H5 enables researchers, quants, and infrastructure teams to store and process tick data once, and analyze forever. It’s built to reduce friction between data acquisition and insight.
Whether you want to build a custom factor model, simulate an HFT strategy, or just explore market microstructure—the data’s ready.
Explore the Project GitHub: github.com/vargaconsulting/iex2h5 Docs: vargaconsulting.github.io/iex2h5