Memory Model¶
Zero-allocation event delivery in FLOX.
The Problem¶
Dynamic memory allocation is slow and unpredictable:
// This allocates memory — bad in hot path
auto event = std::make_shared<TradeEvent>();
tradeBus.publish(event);
Each allocation: - May trigger system calls - Causes unpredictable latency - Fragments the heap - Increases cache misses
FLOX's Solution¶
FLOX eliminates allocations in the hot path through:
- Pre-allocated ring buffers — Events stored directly in
EventBus - Object pools — Reusable event objects for large events
- Reference counting — Automatic lifetime management without smart pointer overhead
Small Events: Direct Storage¶
For small events like TradeEvent, the Disruptor stores them directly:
// EventBus internal storage
std::array<Storage, CapacityPow2> _storage{};
// Publishing: placement new directly into ring buffer
::new (slot_ptr(idx)) Event(std::forward<Ev>(ev));
No allocation happens — the event is copied into pre-allocated storage.
Large Events: Object Pools¶
For large events like BookUpdateEvent (contains vectors of bid/ask levels), FLOX uses pools:
// Pool pre-allocates N objects at startup
pool::Pool<BookUpdateEvent, 128> bookPool;
// Acquire returns a Handle (reference-counted smart pointer)
auto handle = bookPool.acquire();
if (handle) {
(*handle)->update.symbol = symbolId;
(*handle)->update.bids = {...};
// Publish the handle, not the event
bookBus.publish(std::move(handle));
}
Pool Design¶
┌──────────────────────────────────────────────────────────────┐
│ Pool<T, N> │
├──────────────────────────────────────────────────────────────┤
│ Pre-allocated slots: [T] [T] [T] [T] [T] [T] ... [T] │
│ │
│ Free queue: → [ptr] → [ptr] → [ptr] → │
│ │
│ acquire(): pop from free queue, return Handle<T> │
│ release(): push back to free queue │
└──────────────────────────────────────────────────────────────┘
Key properties:
- All objects allocated at pool construction
- acquire() is O(1) — pops from lock-free queue
- release() is O(1) — pushes back
- No heap allocation during operation
The Handle Class¶
pool::Handle<T> is a reference-counted smart pointer:
template <typename T>
class Handle {
T* _ptr;
// Copy: increment ref count
Handle(const Handle& other) : _ptr(other._ptr) {
retain(_ptr);
}
// Move: transfer ownership
Handle(Handle&& other) : _ptr(other._ptr) {
other._ptr = nullptr;
}
// Destructor: decrement ref count, maybe return to pool
~Handle() {
if (_ptr && _ptr->release()) {
_ptr->releaseToPool(); // Returns to pool when refcount hits 0
}
}
};
Benefits:
- Automatic lifetime management
- Can be safely copied to multiple consumers
- Returns to pool when last reference dies
- No shared_ptr overhead (no control block allocation)
PoolableBase¶
Events that use pools inherit from PoolableBase:
struct BookUpdateEvent : public pool::PoolableBase<BookUpdateEvent> {
BookUpdate update;
void clear() {
update.bids.clear();
update.asks.clear();
// ... reset other fields
}
};
The clear() method resets the object for reuse.
Event Flow with Pools¶
sequenceDiagram
participant Pool
participant Conn as Connector
participant Bus as EventBus
participant Cons as Consumer
Note over Pool: Pre-allocated objects
Conn->>Pool: acquire()
Pool-->>Conn: Handle (refcount=1)
Note over Conn: Populate event data
Conn->>Bus: publish(move(handle))
Note over Bus: refcount=1
Bus->>Cons: dispatch(handle)
Note over Cons: refcount=2 (copied)
Cons-->>Bus: callback returns
Note over Bus: refcount=1
Bus->>Bus: slot reclaimed
Note over Bus: refcount=0
Bus->>Pool: releaseToPool()
Note over Pool: Object ready for reuse
Text representation:
PMR (Polymorphic Memory Resources)¶
For events with variable-size data (like vectors), FLOX uses PMR:
// In Pool constructor
std::array<std::byte, 128 * 1024> _buffer; // Stack buffer
std::pmr::monotonic_buffer_resource _arena; // Fast bump allocator
std::pmr::unsynchronized_pool_resource _pool; // Pool allocator
When BookUpdateEvent allocates vectors for bids/asks:
1. Memory comes from the pre-allocated buffer
2. monotonic_buffer_resource provides fast bump-pointer allocation
3. No system calls, no heap fragmentation
RefCountable Interface¶
class RefCountable {
std::atomic<uint32_t> _refCount{1};
public:
void retain() {
_refCount.fetch_add(1, std::memory_order_relaxed);
}
bool release() {
return _refCount.fetch_sub(1, std::memory_order_acq_rel) == 1;
}
void resetRefCount() {
_refCount.store(1, std::memory_order_relaxed);
}
};
Atomic operations ensure thread safety across bus consumers.
Best Practices¶
Do¶
// Acquire from pool, move into bus
if (auto handle = bookPool.acquire()) {
(*handle)->update = buildUpdate();
bookBus.publish(std::move(handle));
}
// Handle pool exhaustion
if (!handle) {
// Log warning, skip event, or use fallback
}
Don't¶
// Don't hold handles longer than necessary
pool::Handle<BookUpdateEvent> cachedHandle; // Bad: blocks pool slot
// Don't allocate in callbacks
void onTrade(const TradeEvent& ev) {
auto data = std::make_unique<BigData>(); // Bad: allocation in hot path
}
// Don't store event pointers
void onBookUpdate(const BookUpdateEvent& ev) {
_cachedEvent = &ev; // Bad: event will be recycled
}
Sizing Pools¶
Pool size = max concurrent uses + headroom:
// If 3 consumers each take ~10ms to process, and events arrive at 1000/sec:
// In-flight events ≈ 3 consumers × 10ms × 1000/sec = 30
// Add headroom: 64 or 128
pool::Pool<BookUpdateEvent, 128> bookPool;
Signs your pool is too small:
- acquire() returns nullopt
- High latency spikes
- Dropped events
Memory Layout¶
FLOX optimizes for cache efficiency:
// Cache-line aligned atomics
alignas(64) std::atomic<int64_t> _next{-1};
// Contiguous event storage
std::array<Storage, CapacityPow2> _storage{};
This ensures: - No false sharing between atomics - Sequential access patterns for events - Minimal cache misses
Summary¶
| Component | Strategy | Allocation |
|---|---|---|
TradeBus |
Direct storage | Zero (ring buffer) |
BookUpdateBus |
Pool + Handle | Zero (pre-allocated) |
| Small events | Copy into bus | Zero |
| Large events | Pool acquire/release | Zero (at runtime) |
| Event vectors | PMR | Zero (from buffer) |
Next Steps¶
- The Disruptor Pattern — How the ring buffer works
- Optimize Performance — Tune for latency