Skip to content

The Disruptor Pattern

Internals — for context only

This page describes how the C++ engine delivers events between threads. If you write strategies in Python, Node.js, or Codon, you don't see any of this directly — you just receive events in your callbacks. Read this if you want to understand why FLOX scales the way it does, not because you need it to write code.

Why FLOX uses ring buffers for event delivery.

The Problem

Traditional publish-subscribe systems have bottlenecks:

flowchart LR
    P[Producer] --> Q[Queue]
    Q --> C[Consumer]
    Q -.->|Contention| X[Locks<br/>Allocations<br/>Cache misses]

For each event:

  1. Lock acquisition — Wait for mutex
  2. Memory allocation — Create queue node
  3. Cache invalidation — Producer and consumer fight over cache lines

At millions of events per second, these costs add up.

The Disruptor Solution

The Disruptor pattern (from LMAX Exchange) eliminates these costs:

              Ring Buffer (pre-allocated)

  [0] [1] [2] [3] [4] [5] [6] [7]
       ↑               ↑
    Consumer       Producer
    Sequence       Sequence

Key insights:

  1. Pre-allocated array — No allocation during publishing
  2. Sequence numbers — Atomic counters replace locks
  3. Cache-line padding — False sharing eliminated
  4. Batching — Consumers can process multiple events

FLOX Implementation

FLOX's EventBus implements a Disruptor-style ring buffer:

template <typename Event,
          size_t CapacityPow2 = config::DEFAULT_EVENTBUS_CAPACITY,
          size_t MaxConsumers = config::DEFAULT_EVENTBUS_MAX_CONSUMERS>
class EventBus : public ISubsystem;

Publishing

// Producer claims slot via atomic increment
int64_t seq = _next.fetch_add(1);

// Wait for consumers to free the slot
while (seq - CapacityPow2 > minConsumerSequence()) {
  backoff.pause();
}

// Write event directly into ring buffer
_storage[seq & Mask] = event;
_published[seq & Mask].store(seq);  // Signal consumers

Consuming

Each consumer runs in a dedicated thread:

while (running) {
  // Wait for next sequence
  while (_published[seq & Mask] != seq) {
    backoff.pause();
  }

  // Process event
  listener->onTrade(_storage[seq & Mask]);

  // Advance sequence
  _consumers[i].seq.store(seq);
}

Sequence Gating

Producers wait for the slowest consumer before overwriting slots:

sequenceDiagram
    participant P as Producer
    participant RB as Ring Buffer
    participant CA as Consumer A
    participant CB as Consumer B

    Note over RB: Capacity = 8

    P->>RB: publish(seq=107)
    RB-->>P: Wait! Slot 99 (107-8) not consumed

    Note over CA: seq = 105
    Note over CB: seq = 102 (slowest)

    CB->>RB: consume(seq=103)
    Note over CB: seq = 103

    RB-->>P: Slot 99 free, continue
    P->>RB: write event at slot 107

Gating Logic:

Producer Sequence: 107
Consumer A Sequence: 105
Consumer B Sequence: 102  ← Slowest (gating sequence)
Ring Buffer Size: 8

Producer can advance to: 102 + 8 = 110

This provides backpressure — fast producers can't overwhelm slow consumers.

Cache-Line Optimization

The Disruptor uses padding to prevent false sharing:

// Without padding: False sharing
struct Bad {
  std::atomic<int64_t> producer_seq;
  std::atomic<int64_t> consumer_seq;  // Same cache line!
};

// With padding: No false sharing
alignas(64) std::atomic<int64_t> producer_seq;
alignas(64) std::atomic<int64_t> consumer_seq;

FLOX uses alignas(64) throughout EventBus:

alignas(64) std::atomic<bool> _running{false};
alignas(64) std::atomic<int64_t> _next{-1};
alignas(64) std::atomic<int64_t> _cachedMin{-1};
// ...
alignas(64) std::array<ConsumerSlot, MaxConsumers> _consumers{};
alignas(64) std::array<std::atomic<int64_t>, MaxConsumers> _gating{};

Busy-Spin vs. Blocking

Consumers use configurable backoff with three modes:

enum class BackoffMode {
  AGGRESSIVE,  // Dedicated colo: busy-spin with CPU pause, minimal yields
  RELAXED,     // Shared VPS/cloud: early sleep, minimal CPU burn
  ADAPTIVE     // Auto-adjust: starts aggressive, backs off under contention
};

BusyBackoff backoff(BackoffMode::ADAPTIVE);  // Default

AGGRESSIVE — for dedicated hardware with isolated cores:

  • 2048 spins with CPU pause
  • Then yield, reset at 4096

RELAXED — for shared VPS/cloud environments:

  • 8 spins, then yield
  • Sleep 100μs after 16 spins
  • Sleep 500μs for sustained idle

ADAPTIVE (default) — auto-adjusts based on contention:

  • 128 spins with CPU pause (low-latency burst handling)
  • 512 spins with yield (medium contention)
  • Sleep 10μs up to 2048 spins
  • Sleep 100μs and reset to medium level

This balances latency (busy-spin) with CPU usage (sleep) based on deployment environment.

Multiple Consumers

FLOX supports multiple consumers per bus:

flowchart LR
    subgraph Producer
        P[Connector Thread]
    end

    subgraph RB[Ring Buffer]
        direction LR
        S0[E0] --- S1[E1] --- S2[E2] --- S3[E3]
    end

    subgraph Consumers[Consumer Threads]
        C1[Strategy A<br/>seq=5]
        C2[Strategy B<br/>seq=3]
        C3[Logger<br/>seq=2]
    end

    P -->|publish| RB
    RB -->|deliver| C1
    RB -->|deliver| C2
    RB -->|deliver| C3

    style C3 fill:#fdd
tradeBus->subscribe(strategyA.get());
tradeBus->subscribe(strategyB.get());
tradeBus->subscribe(logger.get());

Each consumer:

  • Gets a dedicated thread
  • Maintains its own sequence
  • Processes events independently

The producer waits for the slowest consumer before overwriting.

Required vs. Optional Consumers

// Required (default): affects backpressure
tradeBus->subscribe(strategyA.get(), /*required=*/true);

// Optional: doesn't gate the producer
tradeBus->subscribe(logger.get(), /*required=*/false);

Optional consumers:

  • Won't slow down the system if they fall behind
  • May miss events if too slow
  • Useful for monitoring, logging, metrics

Performance Characteristics

Metric Notes
Publish latency Lock-free atomic operations only
Consume latency Depends on backoff strategy and load
Throughput Limited by slowest consumer
Memory overhead Fixed: sizeof(Event) × Capacity

Actual numbers depend heavily on: - CPU architecture and cache hierarchy - Whether cores are isolated - Event size and consumer callback complexity - System load

Run benchmarks on your target hardware to establish baseline.

When Disruptor Shines

Good fit:

  • High-throughput, low-latency requirements
  • Predictable memory usage
  • Single producer, multiple consumers
  • Events are processed in order

Not ideal for:

  • Multiple producers (requires coordination)
  • Unbounded queues
  • Very uneven consumer speeds

Configuration

// Custom capacity and consumer limit
using MyBus = EventBus<TradeEvent,
                       /*CapacityPow2=*/16384,
                       /*MaxConsumers=*/32>;

Capacity must be a power of 2 (for fast modulo via bitmask).

Further Reading