Skip to content

Iterate the order book from a tape

flox_py.orderbook reconstructs the bid / ask ladder from a .floxlog tape's book event stream. Two surfaces sit on top of the same replay path:

  • OrderBookIterator yields a BookSnapshot per bucket window, carrying the latest ladder state observed inside the window.
  • book_at(tape, ts_ns, levels) is a point query that walks the tape up to ts_ns and returns the latest ladder state at or before it.

Both apply the standard floxlog book semantics: a snapshot event replaces the ladder, a delta event adds / changes / removes (qty=0) levels.

When to reach for this

  • Bucket-bar book-aware backtests: at the close of each bar, read the current ladder to compute imbalance, spread, depth-weighted mid, top-of-book microstructure features.
  • Vacuum detection: scan ladders for thin levels on either side as a leading signal.
  • Execution-side what-if: take a candidate trade timestamp, fetch the book at that instant, walk a VWAP / market-impact slice through the actual depth.

OrderBookIterator and book_at are pure-Python wrappers over DataReader.read_book_updates. For the hot tight loop (millions of book events at sub-millisecond resolution per bar) write a Strategy with on_book_snapshot / on_book_delta callbacks and run it through the engine; the wrappers here are for offline reconstruction outside the engine, where readability beats per-event throughput.

Example

The script below builds a tiny synthetic tape and iterates it at a 60-second bucket cadence, then point-queries book_at at a chosen instant:

"""OrderBookIterator round-trip — write a synthetic tape with a few
book events, iterate it at a 60s bucket cadence, then point-query
`book_at` at a chosen offset to confirm the reconstructed ladder.

CI-runnable companion to
[Iterate the order book from a tape](../how-to/iterate-orderbook.md).

Usage:
    cd /path/to/flox
    PYTHONPATH=build/python python3 docs/examples/python_orderbook_iterator.py
"""
from __future__ import annotations

import shutil
import tempfile
from pathlib import Path

import numpy as np

import flox_py
from flox_py import orderbook as ob

_LEVEL_DTYPE = np.dtype([
    ("price_raw", np.int64), ("qty_raw", np.int64), ("side", np.uint8),
])


def _arr(items):
    return np.array(
        [(int(round(p * 1e8)), int(round(q * 1e8)), 0) for p, q in items],
        dtype=_LEVEL_DTYPE,
    )


def main() -> None:
    tmp = Path(tempfile.mkdtemp(prefix="flox-ob-iter-"))
    try:
        tape = tmp / "tape"
        tape.mkdir()
        w = flox_py.DataWriter(str(tape), max_segment_mb=4,
                               exchange_id=0, compression="none")
        base = 1_700_000_000_000_000_000
        events = [
            # ts_offset_s, is_snapshot, bids, asks
            (0, True, [(100.0, 1.0), (99.0, 2.0)], [(101.0, 1.5), (102.0, 2.5)]),
            (30, False, [(100.0, 0.5)], []),
            (90, False, [(99.0, 0.0)], [(101.0, 0.0)]),
            (150, False, [(99.5, 1.2)], [(101.5, 0.4)]),
        ]
        for i, (off_s, is_snap, bids, asks) in enumerate(events):
            ts = base + off_s * 1_000_000_000
            w.write_book(exchange_ts_ns=ts, recv_ts_ns=ts,
                         seq=i, symbol_id=1, is_snapshot=is_snap,
                         bids=_arr(bids), asks=_arr(asks))
        w.close()

        # Iterate at 60s buckets, top-5 per side.
        print("OrderBookIterator(bucket=60s, levels=5):")
        for snap in ob.OrderBookIterator(tape, bucket_ns=60_000_000_000,
                                         levels=5):
            print(f"  ts={snap.ts_ns} bids[0]={snap.bids[0] if snap.bids else None} "
                  f"asks[0]={snap.asks[0] if snap.asks else None}")

        # Point query at a chosen instant.
        target = base + 75 * 1_000_000_000
        at = ob.book_at(tape, ts_ns=target, levels=5)
        assert at is not None
        print(f"book_at(ts=base+75s): bids[0]={at.bids[0]} "
              f"asks[0]={at.asks[0]}")
    finally:
        shutil.rmtree(tmp, ignore_errors=True)


if __name__ == "__main__":
    main()

Iterator semantics

OrderBookIterator(tape_path, bucket_ns, levels, t_from=None, t_to=None, symbol_id=None):

  • Snapshots are keyed on the floor of each event timestamp onto the bucket_ns grid. Consecutive snapshots therefore advance by bucket_ns.
  • The snapshot emitted for bucket B captures the ladder state right before the first event of bucket B + bucket_ns. The state is exclusive of any event in the next bucket; the snapshot for the final bucket reflects every event seen.
  • Buckets with no book events are skipped.
  • When symbol_id is not set and the tape carries multiple symbols, the iterator yields one snapshot per (bucket, symbol).
  • BookSnapshot.crossed is True when the best bid price meets or exceeds the best ask. This is typically a momentary artifact of out-of-order book events on captures without the sorted flag; the caller can choose to drop the snapshot or proceed.

Point query semantics

book_at(tape_path, ts_ns, levels, symbol_id=None, t_from=None):

  • Walks events with exchange_ts_ns <= ts_ns and returns the most recent state.
  • Returns None when the tape has no book events for the requested symbol up to ts_ns.
  • Without a symbol_id filter on a multi-symbol tape, returns the snapshot for whichever symbol carried the most recent book event.

Performance

The 38-day BTC tape benchmark from the tracker takes roughly two minutes end-to-end at a 60-second bucket cadence using OrderBookIterator — well under the 5-minute budget. The bound is the Python loop over book events; the underlying DataReader.read_book_updates already runs in C++.

If the tape carries a lot of trades but few book updates, iteration is even cheaper. For a tape-side filter that emits only buckets meeting a microstructure condition (top-K thin levels, depth imbalance over a threshold, queued size per side), wrap the iterator in a generator expression — the per-bucket cost is dominated by the ladder mutation, not the surfacing of the snapshot.

See also