March 1, 2026Poly Engineering1 min readEngineering

Lock-Free Routing in the Poly Gateway

How we achieved sub-5ms routing latency with lock-free concurrency and SIMD-accelerated scoring

The Problem

When you send a message through Poly Chat, it needs to reach the right AI model — fast. Our gateway routes requests across 8+ providers (OpenAI, Anthropic, Google, and more), and every millisecond of routing overhead is a millisecond of latency your users feel.

Traditional approaches use mutexes or read-write locks to protect shared routing state. Under high concurrency, these locks become bottlenecks. We needed something better.

Lock-Free Architecture

We replaced all hot-path locks with lock-free data structures:

DashMap for concurrent provider registries — sharded internally, no global lock
ArcSwap for atomic reference updates to routing tables — readers never block
Crossbeam SegQueue for lock-free request queues between pipeline stages

The result: 10x improvement under contention compared to our previous RwLock-based design.

SIMD-Accelerated Scoring

Route selection scores each provider on cost, latency, and capability match. With 8+ providers and multiple scoring dimensions, this is a perfect fit for SIMD vectorization.

Using wide::f32x8, we evaluate all providers simultaneously in a single vector operation. On AVX2-capable hardware, this gives us a 5.4x speedup over scalar scoring.

Results

Routing latency: <5ms p99 (down from ~25ms)
Throughput: 10,000+ req/s on a single node
Thermal state detection: 0.31ns (320x better than our 100ns target)

The gateway now handles peak traffic without breaking a sweat — and without a single lock on the hot path.

Building the AGI Stack

Lock-Free Routing in the Poly Gateway
Building the Research Lab: Architecture of a Multi-Modal Research Platform

The Problem

Lock-Free Architecture

SIMD-Accelerated Scoring

Results

Building the AGI Stack

Stay up to date with Poly