Skip to content

Build a cross-sectional (T × S) panel from N floxlog tapes

Cross-symbol research (XS momentum, pair stat-arb, rank-based long-short, dispersion) reduces to the same input: an aligned 2D array indexed by (time bucket, symbol). The hand-rolled version is shared bug surface: intersection vs union join logic, NaN handling, symbol-ordering drift between calls, off-by-one in the alignment.

flox_py.panel collapses the pattern into one call. The input is a list of symbol names plus either a tape_root whose immediate subdirectories are the per-symbol tapes, or an explicit tape_paths mapping.

Helpers

Function Output
build_close_panel(symbols, ..., align=...) Panel(ts, values=(T,S), symbols)
build_ohlc_panel(symbols, ..., align=...) OHLCPanel(ts, open, high, low, close, symbols)
build_returns_panel(symbols, ..., lookback_n=N, align=...) ReturnsPanel(ts, values=(T,S), symbols)

Alignment modes

Mode Index NaN policy
intersection timestamps present in every tape none (intersection guarantees a value per cell)
union_nan union of every tape's timestamps NaN for missing cells
union_ffill union of every tape's timestamps forward-fill from the last known bar for that symbol

For rank-based XS strategies the right default is intersection. A missing bar in any symbol kills the rank for that bar. For long-only momentum or single-symbol baselines, union_ffill is usually what you want.

Example

"""Cross-sectional panel round-trip — write three per-symbol synthetic
tapes, run `build_close_panel` against them in each of the three
alignment modes, and print the (T, S) shape + NaN pattern so the
output matches what the how-to claims.

CI-runnable companion to
[Cross-sectional panel builder](../how-to/cross-sectional-panel.md).

Usage:
    cd /path/to/flox
    PYTHONPATH=build/python python3 docs/examples/python_xs_panel.py
"""
from __future__ import annotations

import shutil
import tempfile
from pathlib import Path

import numpy as np

import flox_py
from flox_py import panel as panel_mod


_BUCKET_NS = 60_000_000_000   # 1 minute


def _write_tape(root: Path, symbol: str, *,
                bucket_starts_ms: list[int],
                base_price: float) -> None:
    tape_dir = root / symbol
    tape_dir.mkdir(parents=True, exist_ok=True)
    w = flox_py.DataWriter(str(tape_dir), max_segment_mb=64,
                           exchange_id=0, compression="none")
    try:
        for i, ts_ms in enumerate(bucket_starts_ms):
            ts_ns = int(ts_ms) * 1_000_000 + 1_000_000
            w.write_trade(
                exchange_ts_ns=ts_ns, recv_ts_ns=ts_ns,
                price=float(base_price + i * 0.5),
                qty=1.0,
                trade_id=10_000 + i,
                symbol_id=1,
                side=0,
            )
    finally:
        w.close()


def main() -> None:
    workdir = Path(tempfile.mkdtemp(prefix="flox-xs-panel-"))
    try:
        base_ms = 1_700_000_000_000
        full = [base_ms + i * 60_000 for i in range(5)]
        sparse = [base_ms + i * 60_000 for i in (0, 1, 3, 4)]
        _write_tape(workdir, "BTCUSDT", bucket_starts_ms=full,   base_price=40_000.0)
        _write_tape(workdir, "ETHUSDT", bucket_starts_ms=full,   base_price=2_500.0)
        _write_tape(workdir, "SOLUSDT", bucket_starts_ms=sparse, base_price=150.0)

        symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT"]
        for mode in ("intersection", "union_nan", "union_ffill"):
            p = panel_mod.build_close_panel(
                symbols, bucket_ns=_BUCKET_NS,
                tape_root=workdir, align=mode,
            )
            nans = int(np.isnan(p.values).sum())
            print(f"mode={mode:13s} shape={p.values.shape} nans={nans}")

        # Returns panel: 2-bucket lookback.
        rp = panel_mod.build_returns_panel(
            symbols, bucket_ns=_BUCKET_NS,
            tape_root=workdir, lookback_n=2, align="intersection",
        )
        print(f"returns lookback_n={rp.lookback_n} shape={rp.values.shape}")
    finally:
        shutil.rmtree(workdir, ignore_errors=True)


if __name__ == "__main__":
    main()

The output:

mode=intersection  shape=(4, 3) nans=0
mode=union_nan     shape=(5, 3) nans=1
mode=union_ffill   shape=(5, 3) nans=0
returns lookback_n=2 shape=(4, 3)

The third symbol (SOLUSDT) deliberately skips one bucket. Intersection drops that bucket from every column. union_nan keeps it and surfaces one NaN. union_ffill keeps it and copies the previous bar's close into the gap.

Column ordering is your contract

The column order in values and the entries in symbols mirror the input list verbatim. The helpers never sort. Reverse the input and the columns reverse. This matters because callers typically index columns by position (values[:, j]), not by name lookup.

Time bounds

t_from and t_to (nanoseconds, half-open [t_from, t_to)) clip every per-symbol series before alignment. The clip happens inside the aggregator output so empty rows never reach the alignment join.

When this is not the right primitive

If the strategy consumes per-trade data (microstructure, queue position), the panel is the wrong abstraction; keep the per-tape DataReader + aggregator chain. Panels are for bar-level cross-sectional work, which is where the join boilerplate had actually been a pain point.