Skip to content

Walk-forward

Walk-forward backtesting

Walk-forward splits the time series into successive train / test windows, runs the strategy through each pair, and returns per-fold stats. Useful as a sanity check on overfitting: if the strategy looks great on the train half but degrades on the held-out test half across folds, the backtest is overfit.

flox_py.WalkForwardRunner ships in Python. The same primitive is in flox.WalkForwardRunner for Node and through the C ABI for Codon (flox_walk_forward_run_csv).

Modes

anchored — the train window starts at bar 0 and grows. Each fold trains on [0, t], tests on [t, t + test_size], then t advances by step. Set min_train_size to skip the first folds where the train window is too small.

sliding — the train window is fixed-size and slides forward. Each fold trains on [t, t + train_size], tests on [t + train_size, t + train_size + test_size], t advances by step.

Python

import flox_py as flox

reg = flox.SymbolRegistry()
btc = reg.add_symbol("exchange", "BTCUSDT", 0.01)


class SmaCross(flox.Strategy):
    def __init__(self, syms):
        super().__init__(syms)
        self.fast = flox.SMA(10)
        self.slow = flox.SMA(30)

    def on_trade(self, ctx, t):
        f = self.fast.update(t.price)
        s = self.slow.update(t.price)
        if f is None or s is None or not self.slow.ready:
            return
        if f > s and ctx.is_flat():
            self.market_buy(0.01)
        elif f < s and ctx.is_flat():
            self.market_sell(0.01)


wfr = flox.WalkForwardRunner(
    reg, fee_rate=0.0004, initial_capital=10_000,
    mode="anchored", test_size=100, step=100, min_train_size=100,
)

# Factory called twice per fold (train, then test). Build a fresh
# strategy every time — state from a prior fold must not leak.
wfr.set_strategy_factory(lambda fold_index: SmaCross([btc]))

folds = wfr.run_csv("data/btcusdt_sample.csv", "BTCUSDT")
for f in folds:
    print(f"fold {f['fold_index']}: "
          f"train return={f['train_stats']['return_pct']:+.4f}% "
          f"sharpe={f['train_stats']['sharpe']:+.4f} | "
          f"test return={f['test_stats']['return_pct']:+.4f}% "
          f"sharpe={f['test_stats']['sharpe']:+.4f}")

The factory pattern is non-negotiable: the engine calls it once per window with no shortcut for "reuse my strategy". A leaked indicator buffer or position counter from a prior fold would silently corrupt the next fold's stats.

Node

const flox = require('flox-node');

const reg = new flox.SymbolRegistry();
const btc = reg.addSymbol('exchange', 'BTCUSDT', 0.01);

const wfr = new flox.WalkForwardRunner(reg, 0.0004, 10000, {
  mode: 'anchored', testSize: 100, step: 100, minTrainSize: 100,
});

wfr.setStrategyFactory((foldIndex) => {
  const fast = new flox.SMA(10);
  const slow = new flox.SMA(30);
  return {
    symbols: [Number(btc)],
    onTrade(ctx, t, emit) {
      const f = fast.update(t.price);
      const s = slow.update(t.price);
      if (f === null || s === null || !slow.ready) return;
      if (f > s && ctx.position === 0) emit.marketBuy(0.01);
      else if (f < s && ctx.position === 0) emit.marketSell(0.01);
    },
  };
});

const folds = wfr.runCsv('data/btcusdt_sample.csv', 'BTCUSDT');
folds.forEach(f => console.log(f.foldIndex, f.testStats.returnPct));

What you get back per fold

{
  "fold_index": 0,
  "train_start_bar": 0, "train_end_bar": 100,
  "test_start_bar": 100, "test_end_bar": 200,
  "train_start_ns": ..., "train_end_ns": ...,
  "test_start_ns": ..., "test_end_ns": ...,
  "train_stats": { ... full BacktestStats ... },
  "test_stats":  { ... full BacktestStats ... },
}

The two *_stats blocks are the same shape as BacktestRunner.run_csv returns. Compute aggregate statistics (mean / median / variance over folds) on the client side — the runner does not aggregate for you on purpose, since useful aggregates depend on what you are looking for (robustness vs. average performance vs. worst case).

Full OHLCV path for on_bar strategies

run_csv replays close prices as synthetic trade events — on_trade fires, but on_bar does not, and intrabar high / low / volume are not preserved. For strategies whose decisions depend on bar internals (TP/SL ladders on high/low, ATR-style indicators, breakout filters), use run_bars with numpy arrays:

import flox_py as flox
import numpy as np

reg = flox.SymbolRegistry()
btc = reg.add_symbol("exchange", "BTCUSDT", 0.01)


class IntrabarBreakout(flox.Strategy):
    def __init__(self, syms):
        super().__init__(syms)
        self.in_pos = False

    def on_bar(self, ctx, bar):
        if not self.in_pos and bar.high >= bar.open * 1.005:
            self.market_buy(0.01)
            self.in_pos = True
        elif self.in_pos and bar.low <= bar.open * 0.99:
            self.market_sell(0.01)
            self.in_pos = False


wfr = flox.WalkForwardRunner(
    reg, fee_rate=0.0, initial_capital=10_000,
    mode="sliding", train_size=4380, test_size=2190, step=2190,
)
wfr.set_strategy_factory(lambda _i: IntrabarBreakout([btc]))

# OHLCV arrays — all must have the same length, sorted by end_time_ns.
start_ns = ...  # int64 ns, bar open
end_ns = ...    # int64 ns, bar close
open_, high, low, close, volume = ...  # float64

folds = wfr.run_bars(
    start_ns, end_ns, open_, high, low, close, volume,
    symbol="BTCUSDT")

Each fold dispatches BarEvents with full OHLCV preserved. on_bar fires; on_trade does not — same convention as BacktestRunner.run_bars. bar_type (default 0 = Time) and bar_type_param are forwarded for non-time bar aggregations.

What walk-forward does not do

It does not optimise hyperparameters per fold. If you need that, run grid search on each fold's train slice yourself, pick the best params, then evaluate on test. That pattern is the standard "walk-forward optimisation" but it is opinionated enough that the runner stays out of it — compose the primitives.