Skip to content

The Disruptor Pattern

Why FLOX uses ring buffers for event delivery.

The Problem

Traditional publish-subscribe systems have bottlenecks:

flowchart LR
    P[Producer] --> Q[Queue]
    Q --> C[Consumer]
    Q -.->|Contention| X[Locks<br/>Allocations<br/>Cache misses]

For each event:

  1. Lock acquisition — Wait for mutex
  2. Memory allocation — Create queue node
  3. Cache invalidation — Producer and consumer fight over cache lines

At millions of events per second, these costs add up.

The Disruptor Solution

The Disruptor pattern (from LMAX Exchange) eliminates these costs:

              Ring Buffer (pre-allocated)

  [0] [1] [2] [3] [4] [5] [6] [7]
       ↑               ↑
    Consumer       Producer
    Sequence       Sequence

Key insights:

  1. Pre-allocated array — No allocation during publishing
  2. Sequence numbers — Atomic counters replace locks
  3. Cache-line padding — False sharing eliminated
  4. Batching — Consumers can process multiple events

FLOX Implementation

FLOX's EventBus implements a Disruptor-style ring buffer:

template <typename Event,
          size_t CapacityPow2 = config::DEFAULT_EVENTBUS_CAPACITY,
          size_t MaxConsumers = config::DEFAULT_EVENTBUS_MAX_CONSUMERS>
class EventBus : public ISubsystem;

Publishing

// Producer claims slot via atomic increment
int64_t seq = _next.fetch_add(1);

// Wait for consumers to free the slot
while (seq - CapacityPow2 > minConsumerSequence()) {
  backoff.pause();
}

// Write event directly into ring buffer
_storage[seq & Mask] = event;
_published[seq & Mask].store(seq);  // Signal consumers

Consuming

Each consumer runs in a dedicated thread:

while (running) {
  // Wait for next sequence
  while (_published[seq & Mask] != seq) {
    backoff.pause();
  }

  // Process event
  listener->onTrade(_storage[seq & Mask]);

  // Advance sequence
  _consumers[i].seq.store(seq);
}

Sequence Gating

Producers wait for the slowest consumer before overwriting slots:

sequenceDiagram
    participant P as Producer
    participant RB as Ring Buffer
    participant CA as Consumer A
    participant CB as Consumer B

    Note over RB: Capacity = 8

    P->>RB: publish(seq=107)
    RB-->>P: Wait! Slot 99 (107-8) not consumed

    Note over CA: seq = 105
    Note over CB: seq = 102 (slowest)

    CB->>RB: consume(seq=103)
    Note over CB: seq = 103

    RB-->>P: Slot 99 free, continue
    P->>RB: write event at slot 107

Gating Logic:

Producer Sequence: 107
Consumer A Sequence: 105
Consumer B Sequence: 102  ← Slowest (gating sequence)
Ring Buffer Size: 8

Producer can advance to: 102 + 8 = 110

This provides backpressure — fast producers can't overwhelm slow consumers.

Cache-Line Optimization

The Disruptor uses padding to prevent false sharing:

// Without padding: False sharing
struct Bad {
  std::atomic<int64_t> producer_seq;
  std::atomic<int64_t> consumer_seq;  // Same cache line!
};

// With padding: No false sharing
alignas(64) std::atomic<int64_t> producer_seq;
alignas(64) std::atomic<int64_t> consumer_seq;

FLOX uses alignas(64) throughout EventBus:

alignas(64) std::atomic<bool> _running{false};
alignas(64) std::atomic<int64_t> _next{-1};
alignas(64) std::atomic<int64_t> _cachedMin{-1};
// ...
alignas(64) std::array<ConsumerSlot, MaxConsumers> _consumers{};
alignas(64) std::array<std::atomic<int64_t>, MaxConsumers> _gating{};

Busy-Spin vs. Blocking

Consumers use configurable backoff with three modes:

enum class BackoffMode {
  AGGRESSIVE,  // Dedicated colo: busy-spin with CPU pause, minimal yields
  RELAXED,     // Shared VPS/cloud: early sleep, minimal CPU burn
  ADAPTIVE     // Auto-adjust: starts aggressive, backs off under contention
};

BusyBackoff backoff(BackoffMode::ADAPTIVE);  // Default

AGGRESSIVE — for dedicated hardware with isolated cores:

  • 2048 spins with CPU pause
  • Then yield, reset at 4096

RELAXED — for shared VPS/cloud environments:

  • 8 spins, then yield
  • Sleep 100μs after 16 spins
  • Sleep 500μs for sustained idle

ADAPTIVE (default) — auto-adjusts based on contention:

  • 128 spins with CPU pause (low-latency burst handling)
  • 512 spins with yield (medium contention)
  • Sleep 10μs up to 2048 spins
  • Sleep 100μs and reset to medium level

This balances latency (busy-spin) with CPU usage (sleep) based on deployment environment.

Multiple Consumers

FLOX supports multiple consumers per bus:

flowchart LR
    subgraph Producer
        P[Connector Thread]
    end

    subgraph RB[Ring Buffer]
        direction LR
        S0[E0] --- S1[E1] --- S2[E2] --- S3[E3]
    end

    subgraph Consumers[Consumer Threads]
        C1[Strategy A<br/>seq=5]
        C2[Strategy B<br/>seq=3]
        C3[Logger<br/>seq=2]
    end

    P -->|publish| RB
    RB -->|deliver| C1
    RB -->|deliver| C2
    RB -->|deliver| C3

    style C3 fill:#fdd
tradeBus->subscribe(strategyA.get());
tradeBus->subscribe(strategyB.get());
tradeBus->subscribe(logger.get());

Each consumer:

  • Gets a dedicated thread
  • Maintains its own sequence
  • Processes events independently

The producer waits for the slowest consumer before overwriting.

Required vs. Optional Consumers

// Required (default): affects backpressure
tradeBus->subscribe(strategyA.get(), /*required=*/true);

// Optional: doesn't gate the producer
tradeBus->subscribe(logger.get(), /*required=*/false);

Optional consumers:

  • Won't slow down the system if they fall behind
  • May miss events if too slow
  • Useful for monitoring, logging, metrics

Performance Characteristics

Metric Notes
Publish latency Lock-free atomic operations only
Consume latency Depends on backoff strategy and load
Throughput Limited by slowest consumer
Memory overhead Fixed: sizeof(Event) × Capacity

Actual numbers depend heavily on: - CPU architecture and cache hierarchy - Whether cores are isolated - Event size and consumer callback complexity - System load

Run benchmarks on your target hardware to establish baseline.

When Disruptor Shines

Good fit:

  • High-throughput, low-latency requirements
  • Predictable memory usage
  • Single producer, multiple consumers
  • Events are processed in order

Not ideal for:

  • Multiple producers (requires coordination)
  • Unbounded queues
  • Very uneven consumer speeds

Configuration

// Custom capacity and consumer limit
using MyBus = EventBus<TradeEvent,
                       /*CapacityPow2=*/16384,
                       /*MaxConsumers=*/32>;

Capacity must be a power of 2 (for fast modulo via bitmask).

Further Reading