Import Binance public book archives into a floxlog tape¶

The aggTrades importer covered in Import Binance public archives handles the trade stream. The book side of the archive ships in two separate products on data.binance.vision:

bookTicker (alias t1): best bid / ask snapshot per book update. Light, available for spot / um-futures / cm-futures.
bookDepth (alias depth20): top-20 levels per side per book update. Heavier (around 10x aggTrades), today only published for um-futures.

Both products live alongside aggtrades_to_floxlog in flox_py.archives.binance and emit book events into the same .floxlog tape directory, so a single tape can carry trades + book updates the engine and MergedTapeReader read interleaved.

Functions¶

binance.t1_to_floxlog(csv_path, out_tape, *, symbol_id=1, symbol_name="",
                      market="um-futures", append=True, ...)
                      -> BookConvertStats

binance.depth20_to_floxlog(csv_path, out_tape, *, levels=20, ...)
                      -> BookConvertStats

binance.range_book_to_floxlog(
    symbol, market, book_type, date_from, date_to, out_tape, *,
    mirror=None, parallel=2, levels=20, ...,
) -> BookConvertStats

book_type is "t1" or "depth20". The range form downloads missing zips from data.binance.vision, caches them under mirror when set, and runs the converter on the local files serially so the writer stays append-safe.

Example¶

The script below is the same one CI runs on every push. It builds synthetic bookTicker and bookDepth zips, converts each to a separate .floxlog, and reads the book events back through DataReader.read_book_updates:

"""Binance public book archive round-trip — build small synthetic
bookTicker and bookDepth CSVs in the exact layout published on
``data.binance.vision``, push them through the converters, and read
back via ``DataReader.read_book_updates`` to confirm the book stream
round-trips.

CI-runnable companion to
[Import Binance book archives](../how-to/import-binance-book-archive.md).
No network — both fixtures are built in memory.

Usage:
    cd /path/to/flox
    PYTHONPATH=build/python python3 docs/examples/python_binance_book_archive.py
"""
from __future__ import annotations

import io
import shutil
import tempfile
import zipfile
from pathlib import Path

import flox_py
from flox_py.archives import binance


_T1_ROWS = [
    (100, 42_000.0, 1.0, 42_001.0, 1.0, 1_700_000_000_000, 1_700_000_000_000),
    (101, 42_000.0, 1.0, 42_001.0, 1.0, 1_700_000_001_000, 1_700_000_001_000),  # unchanged
    (102, 41_999.0, 2.0, 42_002.0, 1.5, 1_700_000_002_000, 1_700_000_002_000),
]


_D20_ROWS = [
    ("BTCUSDT", 1_700_000_000_000, 1, 10, "b", "snap", 42_000.0, 1.0),
    ("BTCUSDT", 1_700_000_000_000, 1, 10, "b", "snap", 41_999.0, 1.5),
    ("BTCUSDT", 1_700_000_000_000, 1, 10, "a", "snap", 42_001.0, 1.0),
    ("BTCUSDT", 1_700_000_000_000, 1, 10, "a", "snap", 42_002.0, 1.5),
    ("BTCUSDT", 1_700_000_001_000, 11, 11, "b", "set", 42_000.0, 0.0),
    ("BTCUSDT", 1_700_000_001_000, 11, 11, "b", "set", 42_000.5, 0.8),
    ("BTCUSDT", 1_700_000_001_000, 11, 11, "a", "set", 42_001.0, 0.5),
]


def _build_zip(dest: Path, rows) -> Path:
    buf = io.StringIO()
    for r in rows:
        buf.write(",".join(str(x) for x in r) + "\n")
    with zipfile.ZipFile(dest, "w", zipfile.ZIP_DEFLATED) as zf:
        zf.writestr(dest.with_suffix(".csv").name, buf.getvalue())
    return dest


def main() -> None:
    workdir = Path(tempfile.mkdtemp(prefix="flox-binance-book-"))
    try:
        # bookTicker: 3 rows → 1 snapshot + 1 delta (one row unchanged).
        bt_zip = workdir / "BTCUSDT-bookTicker-2024-01-15.zip"
        _build_zip(bt_zip, _T1_ROWS)
        bt_tape = workdir / "tape-bt"
        bt_stats = binance.t1_to_floxlog(
            bt_zip, bt_tape, symbol_id=1, symbol_name="BTCUSDT",
            market="um-futures",
        )
        print(f"t1: snapshots={bt_stats.snapshots_written} "
              f"deltas={bt_stats.deltas_written} "
              f"skipped={bt_stats.rows_skipped}")

        # bookDepth: long-format snap → 1 snapshot, second group → 1 delta.
        d20_zip = workdir / "BTCUSDT-bookDepth-2024-01-15.zip"
        _build_zip(d20_zip, _D20_ROWS)
        d20_tape = workdir / "tape-d20"
        d20_stats = binance.depth20_to_floxlog(
            d20_zip, d20_tape, levels=20,
            symbol_id=1, symbol_name="BTCUSDT",
        )
        print(f"depth20: snapshots={d20_stats.snapshots_written} "
              f"deltas={d20_stats.deltas_written}")

        for label, tape in (("t1", bt_tape), ("depth20", d20_tape)):
            r = flox_py.DataReader(str(tape))
            headers, _ = r.read_book_updates()
            print(f"  {label}: {int(headers.size)} book events")
    finally:
        shutil.rmtree(workdir, ignore_errors=True)


if __name__ == "__main__":
    main()

Delta encoding¶

bookTicker rows are best-bid / best-ask snapshots, one per update. The first imported row becomes a book snapshot; every subsequent row emits a delta with the changed top level (qty=0 marks a removed price). Unchanged ticks are skipped so the tape does not accumulate dead data.

bookDepth is published in long format: one row per (event, side, price) tuple, grouped by last_update_id. The first group with update_type=snap becomes a snapshot containing every (price, qty) level for both sides; later groups emit a delta with only the levels that changed against the prior snapshot, again with qty=0 marking removals.

Both converters keep the running ladder state in memory while writing, so the engine's existing book replay (snapshot replaces book, delta applies on top) gives the same (bids, asks) sequence the original CSV described.

CLI¶

flox archive binance accepts the same --book flag on either form:

# Single day from a local CSV
flox archive binance --book t1 \
  --csv ./BTCUSDT-bookTicker-2024-01-15.zip \
  --out ./tapes/binance-um-BTCUSDT \
  --symbol BTCUSDT --market um-futures

# Multi-day range with download
flox archive binance --book depth20 \
  --symbol BTCUSDT --market um-futures \
  --from 2024-01-01 --to 2024-01-07 \
  --out ./tapes/binance-um-BTCUSDT \
  --mirror ./.cache/binance \
  --parallel 2

Use --book-levels to cap the ladder width if the archive publishes more than the default 20 levels per side.

Append-safe and idempotent¶

The converter dedups on the update_id (bookTicker) or last_update_id (bookDepth) it has already written to the tape. Re-running the same day, or running an overlapping range, is a no-op: every previously-imported row is skipped, and the writer adds zero new events. The reported rows_skipped counter shows how many rows were elided.

Co-existence with aggTrades¶

Trades and book events sit in one .floxlog directory side by side. The engine and MergedTapeReader interleave them on exchange_ts_ns at read time, so the same tape feeds both trade aggregators and book-level analyses (depth-aware execution, vacuum / queue analysis, top-of-book microstructure) in a single pass.

metadata.json carries the union counters: total_trades, total_book_updates, has_trades, has_book_snapshots, has_book_deltas, and the maximum observed book_depth. The MergedTapeReader keys by (metadata.exchange, name) exactly as for the trade-only case.