Build a cross-sectional (T × S) panel from N floxlog tapes¶
Cross-symbol research (XS momentum, pair stat-arb, rank-based long-short, dispersion) reduces to the same input: an aligned 2D array indexed by (time bucket, symbol). The hand-rolled version is shared bug surface: intersection vs union join logic, NaN handling, symbol-ordering drift between calls, off-by-one in the alignment.
flox_py.panel collapses the pattern into one call. The input is a list of symbol names plus either a tape_root whose immediate subdirectories are the per-symbol tapes, or an explicit tape_paths mapping.
Helpers¶
| Function | Output |
|---|---|
build_close_panel(symbols, ..., align=...) |
Panel(ts, values=(T,S), symbols) |
build_ohlc_panel(symbols, ..., align=...) |
OHLCPanel(ts, open, high, low, close, symbols) |
build_returns_panel(symbols, ..., lookback_n=N, align=...) |
ReturnsPanel(ts, values=(T,S), symbols) |
Alignment modes¶
| Mode | Index | NaN policy |
|---|---|---|
intersection |
timestamps present in every tape | none (intersection guarantees a value per cell) |
union_nan |
union of every tape's timestamps | NaN for missing cells |
union_ffill |
union of every tape's timestamps | forward-fill from the last known bar for that symbol |
For rank-based XS strategies the right default is intersection. A missing bar in any symbol kills the rank for that bar. For long-only momentum or single-symbol baselines, union_ffill is usually what you want.
Example¶
"""Cross-sectional panel round-trip — write three per-symbol synthetic
tapes, run `build_close_panel` against them in each of the three
alignment modes, and print the (T, S) shape + NaN pattern so the
output matches what the how-to claims.
CI-runnable companion to
[Cross-sectional panel builder](../how-to/cross-sectional-panel.md).
Usage:
cd /path/to/flox
PYTHONPATH=build/python python3 docs/examples/python_xs_panel.py
"""
from __future__ import annotations
import shutil
import tempfile
from pathlib import Path
import numpy as np
import flox_py
from flox_py import panel as panel_mod
_BUCKET_NS = 60_000_000_000 # 1 minute
def _write_tape(root: Path, symbol: str, *,
bucket_starts_ms: list[int],
base_price: float) -> None:
tape_dir = root / symbol
tape_dir.mkdir(parents=True, exist_ok=True)
w = flox_py.DataWriter(str(tape_dir), max_segment_mb=64,
exchange_id=0, compression="none")
try:
for i, ts_ms in enumerate(bucket_starts_ms):
ts_ns = int(ts_ms) * 1_000_000 + 1_000_000
w.write_trade(
exchange_ts_ns=ts_ns, recv_ts_ns=ts_ns,
price=float(base_price + i * 0.5),
qty=1.0,
trade_id=10_000 + i,
symbol_id=1,
side=0,
)
finally:
w.close()
def main() -> None:
workdir = Path(tempfile.mkdtemp(prefix="flox-xs-panel-"))
try:
base_ms = 1_700_000_000_000
full = [base_ms + i * 60_000 for i in range(5)]
sparse = [base_ms + i * 60_000 for i in (0, 1, 3, 4)]
_write_tape(workdir, "BTCUSDT", bucket_starts_ms=full, base_price=40_000.0)
_write_tape(workdir, "ETHUSDT", bucket_starts_ms=full, base_price=2_500.0)
_write_tape(workdir, "SOLUSDT", bucket_starts_ms=sparse, base_price=150.0)
symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT"]
for mode in ("intersection", "union_nan", "union_ffill"):
p = panel_mod.build_close_panel(
symbols, bucket_ns=_BUCKET_NS,
tape_root=workdir, align=mode,
)
nans = int(np.isnan(p.values).sum())
print(f"mode={mode:13s} shape={p.values.shape} nans={nans}")
# Returns panel: 2-bucket lookback.
rp = panel_mod.build_returns_panel(
symbols, bucket_ns=_BUCKET_NS,
tape_root=workdir, lookback_n=2, align="intersection",
)
print(f"returns lookback_n={rp.lookback_n} shape={rp.values.shape}")
finally:
shutil.rmtree(workdir, ignore_errors=True)
if __name__ == "__main__":
main()
The output:
mode=intersection shape=(4, 3) nans=0
mode=union_nan shape=(5, 3) nans=1
mode=union_ffill shape=(5, 3) nans=0
returns lookback_n=2 shape=(4, 3)
The third symbol (SOLUSDT) deliberately skips one bucket. Intersection drops that bucket from every column. union_nan keeps it and surfaces one NaN. union_ffill keeps it and copies the previous bar's close into the gap.
Column ordering is your contract¶
The column order in values and the entries in symbols mirror the input list verbatim. The helpers never sort. Reverse the input and the columns reverse. This matters because callers typically index columns by position (values[:, j]), not by name lookup.
Time bounds¶
t_from and t_to (nanoseconds, half-open [t_from, t_to)) clip every per-symbol series before alignment. The clip happens inside the aggregator output so empty rows never reach the alignment join.
When this is not the right primitive¶
If the strategy consumes per-trade data (microstructure, queue position), the panel is the wrong abstraction; keep the per-tape DataReader + aggregator chain. Panels are for bar-level cross-sectional work, which is where the join boilerplate had actually been a pain point.