Skip to content
1 min readEngineering

Lock-Free Routing in the Poly Gateway

How we achieved sub-5ms routing latency with lock-free concurrency and SIMD-accelerated scoring

Part of Building the AGI Stack · Part 1

The Problem

When you send a message through Poly Chat, it needs to reach the right AI model — fast. Our gateway routes requests across 8+ providers (OpenAI, Anthropic, Google, and more), and every millisecond of routing overhead is a millisecond of latency your users feel.

Traditional approaches use mutexes or read-write locks to protect shared routing state. Under high concurrency, these locks become bottlenecks. We needed something better.

Lock-Free Architecture

We replaced all hot-path locks with lock-free data structures:

  • DashMap for concurrent provider registries — sharded internally, no global lock
  • ArcSwap for atomic reference updates to routing tables — readers never block
  • Crossbeam SegQueue for lock-free request queues between pipeline stages

The result: 10x improvement under contention compared to our previous RwLock-based design.

SIMD-Accelerated Scoring

Route selection scores each provider on cost, latency, and capability match. With 8+ providers and multiple scoring dimensions, this is a perfect fit for SIMD vectorization.

Using wide::f32x8, we evaluate all providers simultaneously in a single vector operation. On AVX2-capable hardware, this gives us a 5.4x speedup over scalar scoring.

Results

  • Routing latency: <5ms p99 (down from ~25ms)
  • Throughput: 10,000+ req/s on a single node
  • Thermal state detection: 0.31ns (320x better than our 100ns target)

The gateway now handles peak traffic without breaking a sweat — and without a single lock on the hot path.

Building the AGI Stack

  1. Lock-Free Routing in the Poly Gateway
  2. Building the Research Lab: Architecture of a Multi-Modal Research Platform

Stay up to date with Poly

Get the latest engineering, product, and community updates delivered to your inbox.