The Disruptor Pattern¶
Internals — for context only
This page describes how the C++ engine delivers events between threads. If you write strategies in Python, Node.js, or Codon, you don't see any of this directly — you just receive events in your callbacks. Read this if you want to understand why FLOX scales the way it does, not because you need it to write code.
Why FLOX uses ring buffers for event delivery.
The Problem¶
Traditional publish-subscribe systems have bottlenecks:
flowchart LR
P[Producer] --> Q[Queue]
Q --> C[Consumer]
Q -.->|Contention| X[Locks<br/>Allocations<br/>Cache misses]
For each event:
- Lock acquisition — Wait for mutex
- Memory allocation — Create queue node
- Cache invalidation — Producer and consumer fight over cache lines
At millions of events per second, these costs add up.
The Disruptor Solution¶
The Disruptor pattern (from LMAX Exchange) eliminates these costs:
Key insights:
- Pre-allocated array — No allocation during publishing
- Sequence numbers — Atomic counters replace locks
- Cache-line padding — False sharing eliminated
- Batching — Consumers can process multiple events
FLOX Implementation¶
FLOX's EventBus implements a Disruptor-style ring buffer:
template <typename Event,
size_t CapacityPow2 = config::DEFAULT_EVENTBUS_CAPACITY,
size_t MaxConsumers = config::DEFAULT_EVENTBUS_MAX_CONSUMERS>
class EventBus : public ISubsystem;
Publishing¶
// Producer claims slot via atomic increment
int64_t seq = _next.fetch_add(1);
// Wait for consumers to free the slot
while (seq - CapacityPow2 > minConsumerSequence()) {
backoff.pause();
}
// Write event directly into ring buffer
_storage[seq & Mask] = event;
_published[seq & Mask].store(seq); // Signal consumers
Consuming¶
Each consumer runs in a dedicated thread:
while (running) {
// Wait for next sequence
while (_published[seq & Mask] != seq) {
backoff.pause();
}
// Process event
listener->onTrade(_storage[seq & Mask]);
// Advance sequence
_consumers[i].seq.store(seq);
}
Sequence Gating¶
Producers wait for the slowest consumer before overwriting slots:
sequenceDiagram
participant P as Producer
participant RB as Ring Buffer
participant CA as Consumer A
participant CB as Consumer B
Note over RB: Capacity = 8
P->>RB: publish(seq=107)
RB-->>P: Wait! Slot 99 (107-8) not consumed
Note over CA: seq = 105
Note over CB: seq = 102 (slowest)
CB->>RB: consume(seq=103)
Note over CB: seq = 103
RB-->>P: Slot 99 free, continue
P->>RB: write event at slot 107
Gating Logic:
Producer Sequence: 107
Consumer A Sequence: 105
Consumer B Sequence: 102 ← Slowest (gating sequence)
Ring Buffer Size: 8
Producer can advance to: 102 + 8 = 110
This provides backpressure — fast producers can't overwhelm slow consumers.
Cache-Line Optimization¶
The Disruptor uses padding to prevent false sharing:
// Without padding: False sharing
struct Bad {
std::atomic<int64_t> producer_seq;
std::atomic<int64_t> consumer_seq; // Same cache line!
};
// With padding: No false sharing
alignas(64) std::atomic<int64_t> producer_seq;
alignas(64) std::atomic<int64_t> consumer_seq;
FLOX uses alignas(64) throughout EventBus:
alignas(64) std::atomic<bool> _running{false};
alignas(64) std::atomic<int64_t> _next{-1};
alignas(64) std::atomic<int64_t> _cachedMin{-1};
// ...
alignas(64) std::array<ConsumerSlot, MaxConsumers> _consumers{};
alignas(64) std::array<std::atomic<int64_t>, MaxConsumers> _gating{};
Busy-Spin vs. Blocking¶
Consumers use configurable backoff with three modes:
enum class BackoffMode {
AGGRESSIVE, // Dedicated colo: busy-spin with CPU pause, minimal yields
RELAXED, // Shared VPS/cloud: early sleep, minimal CPU burn
ADAPTIVE // Auto-adjust: starts aggressive, backs off under contention
};
BusyBackoff backoff(BackoffMode::ADAPTIVE); // Default
AGGRESSIVE — for dedicated hardware with isolated cores:
- 2048 spins with CPU pause
- Then yield, reset at 4096
RELAXED — for shared VPS/cloud environments:
- 8 spins, then yield
- Sleep 100μs after 16 spins
- Sleep 500μs for sustained idle
ADAPTIVE (default) — auto-adjusts based on contention:
- 128 spins with CPU pause (low-latency burst handling)
- 512 spins with yield (medium contention)
- Sleep 10μs up to 2048 spins
- Sleep 100μs and reset to medium level
This balances latency (busy-spin) with CPU usage (sleep) based on deployment environment.
Multiple Consumers¶
FLOX supports multiple consumers per bus:
flowchart LR
subgraph Producer
P[Connector Thread]
end
subgraph RB[Ring Buffer]
direction LR
S0[E0] --- S1[E1] --- S2[E2] --- S3[E3]
end
subgraph Consumers[Consumer Threads]
C1[Strategy A<br/>seq=5]
C2[Strategy B<br/>seq=3]
C3[Logger<br/>seq=2]
end
P -->|publish| RB
RB -->|deliver| C1
RB -->|deliver| C2
RB -->|deliver| C3
style C3 fill:#fdd
tradeBus->subscribe(strategyA.get());
tradeBus->subscribe(strategyB.get());
tradeBus->subscribe(logger.get());
Each consumer:
- Gets a dedicated thread
- Maintains its own sequence
- Processes events independently
The producer waits for the slowest consumer before overwriting.
Required vs. Optional Consumers¶
// Required (default): affects backpressure
tradeBus->subscribe(strategyA.get(), /*required=*/true);
// Optional: doesn't gate the producer
tradeBus->subscribe(logger.get(), /*required=*/false);
Optional consumers:
- Won't slow down the system if they fall behind
- May miss events if too slow
- Useful for monitoring, logging, metrics
Performance Characteristics¶
| Metric | Notes |
|---|---|
| Publish latency | Lock-free atomic operations only |
| Consume latency | Depends on backoff strategy and load |
| Throughput | Limited by slowest consumer |
| Memory overhead | Fixed: sizeof(Event) × Capacity |
Actual numbers depend heavily on: - CPU architecture and cache hierarchy - Whether cores are isolated - Event size and consumer callback complexity - System load
Run benchmarks on your target hardware to establish baseline.
When Disruptor Shines¶
Good fit:
- High-throughput, low-latency requirements
- Predictable memory usage
- Single producer, multiple consumers
- Events are processed in order
Not ideal for:
- Multiple producers (requires coordination)
- Unbounded queues
- Very uneven consumer speeds
Configuration¶
// Custom capacity and consumer limit
using MyBus = EventBus<TradeEvent,
/*CapacityPow2=*/16384,
/*MaxConsumers=*/32>;
Capacity must be a power of 2 (for fast modulo via bitmask).
Further Reading¶
- LMAX Disruptor Paper
- Memory Model — How FLOX handles event ownership
- Architecture Overview — Full system design