Record and replay market data with flox tape¶
flox tape captures live market data into a .floxlog directory and replays it deterministically. The on-disk format is the same one the engine writes during backtests, so a session you record today drives a backtest tomorrow without conversion.
Install¶
The [ccxt] extra pulls in ccxt.pro for the live feed. If you have your own data source, attach a BinaryLogRecorderHook (or a user MarketDataRecorderHook subclass) to the Runner directly and skip the extra.
Record a session¶
Arguments:
| Argument | Purpose |
|---|---|
bybit |
Exchange id from ccxt (bybit, bitget, binance, etc). |
BTCUSDT |
Symbol. Either flat (BTCUSDT) or slash form (BTC/USDT); both are accepted. |
--duration 1h |
How long to record. Suffixes: s, m, h, d. Omit to record until Ctrl+C. |
--output PATH |
Destination directory. Created if it does not exist. |
The CLI writes a .floxlog segment directory at the output path and rotates segments at --max-segment-mb (default 256 MB).
When recording stops, you get a summary:
Both trades and book updates are captured. The recorder writes them via BinaryLogWriter inside C++, so there's no per-event Python callback.
Inspect a tape¶
Prints trade count, first and last exchange timestamp, and the symbols seen. Useful as a smoke check after a recording session.
Diff two tapes¶
Compares two .floxlog directories trade-by-trade on (exchange_ts_ns, symbol_id, price_raw, qty_raw, side). Exits 0 when equal, 1 on divergence, with the first divergent index plus a sample of mismatched rows printed to stderr.
--ts-tolerance-ns N lets timestamps drift by up to N nanoseconds before flagging. Useful when comparing two live captures that share content but came through different recv paths. --max-mismatches K caps how many divergent rows are recorded; the rest are summarized by count. --json emits the full diff structure for CI scripting.
The most common reason to run this: the replay-equivalence gate failed and you want to know exactly which trades shifted between the captured reference and what the engine produces today.
The walk runs in the C++ engine; every binding speaks the same surface and returns the same shape:
const flox = require('@flox-foundation/flox');
const diff = flox.tapeDiff('./tapes/run-a', './tapes/run-b',
{ maxMismatches: 16, fieldToleranceNs: 0 });
if (!diff.equal) {
console.log(`first divergence at index ${diff.firstDivergenceIndex}`);
for (const m of diff.mismatches.slice(0, 3)) console.log(m);
}
from flox.tape_diff import diff_tapes
summary = diff_tapes("./tapes/run-a", "./tapes/run-b")
if not summary.equal:
print("first divergence at index", summary.first_divergence_index)
print("total mismatches recorded:", summary.mismatch_count)
Codon exposes the headline summary. For per-record mismatch payloads, use the Python or Node binding.
Replay a tape¶
Walks the trades in exchange-timestamp order and prints one line per event. --max-events caps output; drop it to print everything. The replay reads the same fixed-point types the engine writes, so prices and quantities round-trip exactly.
For programmatic replay inside a strategy or notebook:
from flox_py.tape import replay_tape
def on_trade(ts_ns, sym_id, price, qty, side):
print(ts_ns, price, qty, side)
n = replay_tape("./tapes/bybit-btc-2026-05-07", on_trade=on_trade)
print(f"replayed {n} trades")
Use it from a strategy¶
The recorder is a BinaryLogRecorderHook. Attach it to any Runner, not just the CLI:
import flox_py as flox
from flox_py.tape import make_recorder_hook
registry = flox.SymbolRegistry()
sym = registry.add_symbol("bybit", "BTCUSDT", tick_size=0.01)
recorder = make_recorder_hook("./tapes/run-1", max_segment_mb=64)
runner = flox.Runner(registry, on_signal=lambda sig: None)
runner.set_market_data_recorder(recorder)
# ... feed events through runner.on_trade(...) ...
recorder.close()
print(recorder.stats())
recorder.stats() returns a dict with trades_written, book_updates_written, bytes_written, segments_created, and errors (writer-side rejections).
Historical backfill¶
flox tape record is the live capture path. For historical data — anything you need before the recorder was running — there is no built-in backfill CLI. Two reasons: each exchange exposes a different historical surface (some return trades back N days, some only klines, some neither), and the redistribution rules vary. The framework leaves the choice to the strategy author.
The canonical pattern when you want a tape covering past data:
- Pull the raw history with
ccxt.fetch_ohlcv(orfetch_tradesif the exchange supports it). This is plain ccxt — no flox layer. - Feed the rows into a
RunnerwhoseBinaryLogRecorderHookwrites them to a.floxlog. The recorder does not care whether the trades are live or replayed from history; the wire format is the same.
import time, ccxt
import flox_py as flox
from flox_py.tape import make_recorder_hook
ex = ccxt.bybit({"enableRateLimit": True})
since_ms = ex.parse8601("2026-04-01T00:00:00Z")
end_ms = ex.parse8601("2026-04-08T00:00:00Z")
registry = flox.SymbolRegistry()
sym = registry.add_symbol("bybit", "BTCUSDT", tick_size=0.01)
recorder = make_recorder_hook("./tapes/bybit-btc-apr-week-1", max_segment_mb=64)
runner = flox.Runner(registry, on_signal=lambda s: None)
runner.set_market_data_recorder(recorder)
while since_ms < end_ms:
bars = ex.fetch_ohlcv("BTC/USDT", "1m", since=since_ms, limit=1000)
if not bars:
break
for ts_ms, o, h, l, c, v in bars:
# one synthetic trade per bar at close price; replace with
# fetch_trades + per-trade emission for full fidelity if the
# exchange exposes it.
runner.on_trade(sym, price=c, qty=v, is_buy=True,
ts_ns=ts_ms * 1_000_000)
since_ms = bars[-1][0] + 60_000
time.sleep(ex.rateLimit / 1000)
recorder.close()
print(recorder.stats())
Use fetch_trades when fidelity matters (every print, with native side and id) and fetch_ohlcv when you need a longer window than the exchange exposes for trades. Either way the resulting .floxlog plugs into backtest the same way as a live recording.
Limitations¶
- Trade IDs are written as
0. Replay deduplicates by(symbol_id, exchange_ts_ns, price_raw, qty_raw)instead. - The format version is
v1. A reader from a newer flox release reads v1 tapes; a v1 reader does not read newer tapes. Migration paths come with the public format spec.
Runnable example¶
A complete record-then-replay round-trip with synthetic trades, no ccxt needed:
"""
Tape recording round-trip — record synthetic trades via the recorder
hook, then replay them back through ``replay_tape`` and confirm the
events round-trip exactly.
This example is the CI-runnable companion to
[Record and replay market data](../how-to/tape-record.md). It does not
need ccxt or a live exchange — synthetic trades are pumped directly
through ``runner.on_trade`` so the recorder writes without a network
dependency.
Usage:
cd /path/to/flox
PYTHONPATH=build/python python3 docs/examples/python_tape_roundtrip.py
"""
from __future__ import annotations
import shutil
import tempfile
from pathlib import Path
import flox_py as flox
from flox_py.tape import inspect_tape, make_recorder_hook, replay_tape
def main() -> None:
out = Path(tempfile.mkdtemp(prefix="flox-tape-example-"))
try:
registry = flox.SymbolRegistry()
sym = registry.add_symbol("example", "BTCUSDT", tick_size=0.01)
recorder = make_recorder_hook(out, max_segment_mb=64)
runner = flox.Runner(registry, on_signal=lambda _sig: None)
runner.set_market_data_recorder(recorder)
runner.start()
synthetic = [
(101.50, 0.5, True, 1_000_000_000),
(101.55, 0.3, False, 1_500_000_000),
(101.60, 1.2, True, 2_000_000_000),
]
for price, qty, is_buy, ts_ns in synthetic:
runner.on_trade(sym, price, qty, is_buy, ts_ns)
runner.stop()
recorder.close()
stats = inspect_tape(out)
print(
f"recorded: trades={stats.trade_count} "
f"first_ts={stats.first_ts_ns} last_ts={stats.last_ts_ns}"
)
assert stats.trade_count == 3
assert stats.first_ts_ns == 1_000_000_000
assert stats.last_ts_ns == 2_000_000_000
seen: list[tuple[int, int, float, float, int]] = []
n = replay_tape(
out,
on_trade=lambda ts, s, p, q, side: seen.append((ts, s, p, q, side)),
)
print(f"replayed {n} trades")
assert n == 3
assert [row[2] for row in seen] == [101.50, 101.55, 101.60]
assert [row[3] for row in seen] == [0.5, 0.3, 1.2]
# side: 0 = buy, 1 = sell — matches our (True, False, True) input.
assert [row[4] for row in seen] == [0, 1, 0]
print("round-trip OK")
finally:
shutil.rmtree(out, ignore_errors=True)
if __name__ == "__main__":
main()
Runs in CI and serves as the regression test for the recorder hook plus replay reader.
See also¶
- Backtest with realistic fills. Drive a backtest off a recorded tape.
- CCXT adapter. The live-feed source that
flox tape recordwraps.