Skip to content

Walk-forward

Walk-forward backtesting

Walk-forward splits the time series into successive train / test windows, runs the strategy through each pair, and returns per-fold stats. Useful as a sanity check on overfitting: if the strategy looks great on the train half but degrades on the held-out test half across folds, the backtest is overfit.

flox_py.WalkForwardRunner ships in Python. The same primitive is in flox.WalkForwardRunner for Node and through the C ABI for Codon (flox_walk_forward_run_csv).

Modes

anchored — the train window starts at bar 0 and grows. Each fold trains on [0, t], tests on [t, t + test_size], then t advances by step. Set min_train_size to skip the first folds where the train window is too small.

sliding — the train window is fixed-size and slides forward. Each fold trains on [t, t + train_size], tests on [t + train_size, t + train_size + test_size], t advances by step.

Python

import flox_py as flox

reg = flox.SymbolRegistry()
btc = reg.add_symbol("exchange", "BTCUSDT", 0.01)


class SmaCross(flox.Strategy):
    def __init__(self, syms):
        super().__init__(syms)
        self.fast = flox.SMA(10)
        self.slow = flox.SMA(30)

    def on_trade(self, ctx, t):
        f = self.fast.update(t.price)
        s = self.slow.update(t.price)
        if f is None or s is None or not self.slow.ready:
            return
        if f > s and ctx.is_flat():
            self.market_buy(0.01)
        elif f < s and ctx.is_flat():
            self.market_sell(0.01)


wfr = flox.WalkForwardRunner(
    reg, fee_rate=0.0004, initial_capital=10_000,
    mode="anchored", test_size=100, step=100, min_train_size=100,
)

# Factory called twice per fold (train, then test). Build a fresh
# strategy every time — state from a prior fold must not leak.
wfr.set_strategy_factory(lambda fold_index: SmaCross([btc]))

folds = wfr.run_csv("data/btcusdt_sample.csv", "BTCUSDT")
for f in folds:
    print(f"fold {f['fold_index']}: "
          f"train return={f['train_stats']['return_pct']:+.4f}% "
          f"sharpe={f['train_stats']['sharpe']:+.4f} | "
          f"test return={f['test_stats']['return_pct']:+.4f}% "
          f"sharpe={f['test_stats']['sharpe']:+.4f}")

The factory pattern is non-negotiable: the engine calls it once per window with no shortcut for "reuse my strategy". A leaked indicator buffer or position counter from a prior fold would silently corrupt the next fold's stats.

Node

const flox = require('flox-node');

const reg = new flox.SymbolRegistry();
const btc = reg.addSymbol('exchange', 'BTCUSDT', 0.01);

const wfr = new flox.WalkForwardRunner(reg, 0.0004, 10000, {
  mode: 'anchored', testSize: 100, step: 100, minTrainSize: 100,
});

wfr.setStrategyFactory((foldIndex) => {
  const fast = new flox.SMA(10);
  const slow = new flox.SMA(30);
  return {
    symbols: [Number(btc)],
    onTrade(ctx, t, emit) {
      const f = fast.update(t.price);
      const s = slow.update(t.price);
      if (f === null || s === null || !slow.ready) return;
      if (f > s && ctx.position === 0) emit.marketBuy(0.01);
      else if (f < s && ctx.position === 0) emit.marketSell(0.01);
    },
  };
});

const folds = wfr.runCsv('data/btcusdt_sample.csv', 'BTCUSDT');
folds.forEach(f => console.log(f.foldIndex, f.testStats.returnPct));

What you get back per fold

{
  "fold_index": 0,
  "train_start_bar": 0, "train_end_bar": 100,
  "test_start_bar": 100, "test_end_bar": 200,
  "train_start_ns": ..., "train_end_ns": ...,
  "test_start_ns": ..., "test_end_ns": ...,
  "train_stats": { ... full BacktestStats ... },
  "test_stats":  { ... full BacktestStats ... },
}

The two *_stats blocks are the same shape as BacktestRunner.run_csv returns. Compute aggregate statistics (mean / median / variance over folds) on the client side — the runner does not aggregate for you on purpose, since useful aggregates depend on what you are looking for (robustness vs. average performance vs. worst case).

What walk-forward does not do

It does not optimise hyperparameters per fold. If you need that, run grid search on each fold's train slice yourself, pick the best params, then evaluate on test. That pattern is the standard "walk-forward optimisation" but it is opinionated enough that the runner stays out of it — compose the primitives.