The Disruptor Pattern¶

Internals — for context only

This page describes how the C++ engine delivers events between threads. If you write strategies in Python, Node.js, or Codon, you don't see any of this directly — you just receive events in your callbacks. Read this if you want to understand why FLOX scales the way it does, not because you need it to write code.

Why FLOX uses ring buffers for event delivery.

The Problem¶

Traditional publish-subscribe systems have bottlenecks:

flowchart LR
    P[Producer] --> Q[Queue]
    Q --> C[Consumer]
    Q -.->|Contention| X[Locks<br/>Allocations<br/>Cache misses]

For each event:

Lock acquisition — Wait for mutex
Memory allocation — Create queue node
Cache invalidation — Producer and consumer fight over cache lines

At millions of events per second, these costs add up.

The Disruptor Solution¶

The Disruptor pattern (from LMAX Exchange) eliminates these costs:

              Ring Buffer (pre-allocated)

  [0] [1] [2] [3] [4] [5] [6] [7]
       ↑               ↑
    Consumer       Producer
    Sequence       Sequence

Key insights:

Pre-allocated array — No allocation during publishing
Sequence numbers — Atomic counters replace locks
Cache-line padding — False sharing eliminated
Batching — Consumers can process multiple events

FLOX Implementation¶

FLOX's EventBus implements a Disruptor-style ring buffer:

template <typename Event,
          size_t CapacityPow2 = config::DEFAULT_EVENTBUS_CAPACITY,
          size_t MaxConsumers = config::DEFAULT_EVENTBUS_MAX_CONSUMERS>
class EventBus : public ISubsystem;

Publishing¶

// Producer claims slot via atomic increment
int64_t seq = _next.fetch_add(1);

// Wait for consumers to free the slot
while (seq - CapacityPow2 > minConsumerSequence()) {
  backoff.pause();
}

// Write event directly into ring buffer
_storage[seq & Mask] = event;
_published[seq & Mask].store(seq);  // Signal consumers

Consuming¶

Each consumer runs in a dedicated thread:

while (running) {
  // Wait for next sequence
  while (_published[seq & Mask] != seq) {
    backoff.pause();
  }

  // Process event
  listener->onTrade(_storage[seq & Mask]);

  // Advance sequence
  _consumers[i].seq.store(seq);
}

Sequence Gating¶

Producers wait for the slowest consumer before overwriting slots:

sequenceDiagram
    participant P as Producer
    participant RB as Ring Buffer
    participant CA as Consumer A
    participant CB as Consumer B

    Note over RB: Capacity = 8

    P->>RB: publish(seq=107)
    RB-->>P: Wait! Slot 99 (107-8) not consumed

    Note over CA: seq = 105
    Note over CB: seq = 102 (slowest)

    CB->>RB: consume(seq=103)
    Note over CB: seq = 103

    RB-->>P: Slot 99 free, continue
    P->>RB: write event at slot 107

Gating Logic:

Producer Sequence: 107
Consumer A Sequence: 105
Consumer B Sequence: 102  ← Slowest (gating sequence)
Ring Buffer Size: 8

Producer can advance to: 102 + 8 = 110

This provides backpressure — fast producers can't overwhelm slow consumers.

Cache-Line Optimization¶

The Disruptor uses padding to prevent false sharing:

// Without padding: False sharing
struct Bad {
  std::atomic<int64_t> producer_seq;
  std::atomic<int64_t> consumer_seq;  // Same cache line!
};

// With padding: No false sharing
alignas(64) std::atomic<int64_t> producer_seq;
alignas(64) std::atomic<int64_t> consumer_seq;

FLOX uses alignas(64) throughout EventBus:

alignas(64) std::atomic<bool> _running{false};
alignas(64) std::atomic<int64_t> _next{-1};
alignas(64) std::atomic<int64_t> _cachedMin{-1};
// ...
alignas(64) std::array<ConsumerSlot, MaxConsumers> _consumers{};
alignas(64) std::array<std::atomic<int64_t>, MaxConsumers> _gating{};

Busy-Spin vs. Blocking¶

Consumers use configurable backoff with three modes:

enum class BackoffMode {
  AGGRESSIVE,  // Dedicated colo: busy-spin with CPU pause, minimal yields
  RELAXED,     // Shared VPS/cloud: early sleep, minimal CPU burn
  ADAPTIVE     // Auto-adjust: starts aggressive, backs off under contention
};

BusyBackoff backoff(BackoffMode::ADAPTIVE);  // Default

AGGRESSIVE — for dedicated hardware with isolated cores:

2048 spins with CPU pause
Then yield, reset at 4096

RELAXED — for shared VPS/cloud environments:

8 spins, then yield
Sleep 100μs after 16 spins
Sleep 500μs for sustained idle

ADAPTIVE (default) — auto-adjusts based on contention:

128 spins with CPU pause (low-latency burst handling)
512 spins with yield (medium contention)
Sleep 10μs up to 2048 spins
Sleep 100μs and reset to medium level

This balances latency (busy-spin) with CPU usage (sleep) based on deployment environment.

Multiple Consumers¶

FLOX supports multiple consumers per bus:

flowchart LR
    subgraph Producer
        P[Connector Thread]
    end

    subgraph RB[Ring Buffer]
        direction LR
        S0[E0] --- S1[E1] --- S2[E2] --- S3[E3]
    end

    subgraph Consumers[Consumer Threads]
        C1[Strategy A<br/>seq=5]
        C2[Strategy B<br/>seq=3]
        C3[Logger<br/>seq=2]
    end

    P -->|publish| RB
    RB -->|deliver| C1
    RB -->|deliver| C2
    RB -->|deliver| C3

    style C3 fill:#fdd

tradeBus->subscribe(strategyA.get());
tradeBus->subscribe(strategyB.get());
tradeBus->subscribe(logger.get());

Each consumer:

Gets a dedicated thread
Maintains its own sequence
Processes events independently

The producer waits for the slowest consumer before overwriting.

Required vs. Optional Consumers¶

// Required (default): affects backpressure
tradeBus->subscribe(strategyA.get(), /*required=*/true);

// Optional: doesn't gate the producer
tradeBus->subscribe(logger.get(), /*required=*/false);

Optional consumers:

Won't slow down the system if they fall behind
May miss events if too slow
Useful for monitoring, logging, metrics

Performance Characteristics¶

Metric	Notes
Publish latency	Lock-free atomic operations only
Consume latency	Depends on backoff strategy and load
Throughput	Limited by slowest consumer
Memory overhead	Fixed: `sizeof(Event) × Capacity`

Actual numbers depend heavily on: - CPU architecture and cache hierarchy - Whether cores are isolated - Event size and consumer callback complexity - System load

Run benchmarks on your target hardware to establish baseline.

When Disruptor Shines¶

Good fit:

High-throughput, low-latency requirements
Predictable memory usage
Single producer, multiple consumers
Events are processed in order

Not ideal for:

Multiple producers (requires coordination)
Unbounded queues
Very uneven consumer speeds

Configuration¶

// Custom capacity and consumer limit
using MyBus = EventBus<TradeEvent,
                       /*CapacityPow2=*/16384,
                       /*MaxConsumers=*/32>;

Capacity must be a power of 2 (for fast modulo via bitmask).