Blog
Designing Cloud-Native Trading Systems for Sub-Millisecond Latency
Architecture patterns for low-latency trading systems on cloud infrastructure. FPGA, kernel bypass, and network optimisation strategies for capital markets.
The belief that cloud cannot deliver sub-millisecond trading latency is outdated. The constraint is not the cloud provider — it is how you architect within the cloud. Firms that treat AWS, GCP, or Azure as a data centre with better networking get data-centre performance. Firms that treat the cloud as a programmable substrate get latency numbers that surprise their counterparties.
We have deployed trading systems on AWS and GCP that consistently achieve round-trip latencies under 500 microseconds for order-to-acknowledge paths. The architecture is fundamentally different from on-premise trading infrastructure, but the performance is comparable.
Who Is This Guide For?
This guide is for trading system architects, infrastructure engineers, and CTOs at capital markets firms evaluating cloud deployment for latency-sensitive trading workloads. If you need to move trading infrastructure to the cloud or optimise existing cloud-based trading systems, this is for you.
By the End of This, You’ll Know…
- Why cloud-native trading architecture differs fundamentally from on-premise approaches
- How to achieve sub-millisecond latency using FPGA, kernel bypass, and network optimisation
- Which cloud provider features matter most for trading workloads
- The real cost-performance trade-offs between on-premise and cloud trading infrastructure
Why Cloud Trading Latency Is a Solvable Problem
The latency argument against cloud trading has always been about network hops. On-premise trading systems colocate with exchange matching engines to minimise physical distance. Cloud data centres are further from exchange data centres, so the raw network latency is higher.
But this argument conflates two different latency components:
- Wire latency — the time a signal takes to travel through physical media. This is a function of distance and speed of light. You cannot optimise this in the cloud.
- Software latency — the time your trading application takes to process a market data event, make a decision, and send an order. This is a function of your architecture, and it is dramatically optimisable in the cloud.
Most trading systems spend 80-90% of their latency budget on software processing. The wire latency to the exchange is 50-100 microseconds for on-premise and 200-500 microseconds for cloud. If your software latency is 500 microseconds, moving to the cloud adds 200 microseconds to a 500-microsecond path — a 40% increase. If you reduce software latency to 50 microseconds, the same cloud network adds 200 microseconds to a 50-microsecond path — a 4x increase, but the absolute latency is still under 300 microseconds.
The real question is not “how fast is the cloud?” It is “how fast can you make your software?”
Architecture Patterns
Kernel Bypass Networking
Standard network stacks introduce latency through context switches, buffer copies, and interrupt handling. Kernel bypass techniques — DPDK on Linux, io_uring for async I/O — eliminate these costs by giving your trading application direct access to the network interface card.
Performance impact:
- Standard TCP/IP stack: 15-30 microseconds per packet
- DPDK kernel bypass: 1-3 microseconds per packet
- Improvement: 10-20x reduction in network processing latency
Implementation on AWS:
- Use Enhanced Networking (ENA) with DPDK support
- Launch instances in a Placement Group for minimum network hop count
- Pin network interrupts to dedicated CPU cores
FPGA Acceleration
Field-programmable gate arrays provide hardware-level processing for the most latency-critical path — typically market data decoding and order encoding. FPGA cards from Xilinx (now AMD) and Intel sit in the PCIe slot of your cloud instance and process network packets in hardware, bypassing the CPU entirely.
Latency comparison:
- Software market data decode: 5-15 microseconds
- FPGA market data decode: 0.5-2 microseconds
- Best for: Market data parsing, pre-trade risk checks, order encoding
Cloud availability:
- AWS: F1 instances with FPGA support
- GCP: A2 instances with FPGA-ready networking
- Both support partial reconfiguration for field updates without downtime
Smart NIC Offload
Cloud providers now offer Smart NICs (AWS Nitro, GCP A3) that can offload networking, storage, and security processing from the CPU. For trading workloads, this frees CPU cycles for application logic while maintaining deterministic latency.
Key capabilities:
- Hardware-accelerated VPC processing
- Encrypted network traffic at line rate
- Direct memory access for low-latency storage
Network Topology for Trading
Placement Groups
All cloud providers offer placement groups that ensure your instances are physically close to each other within a data centre. For trading systems, this is non-negotiable.
- Cluster placement groups: All instances in the same physical rack. Lowest latency, highest risk (single rack failure).
- Spread placement groups: Instances on separate hardware. Higher latency, better resilience.
- Partition placement groups: Groups of racks with defined failure domains. Best balance for trading systems.
Dedicated Infrastructure
For the most latency-sensitive workloads, cloud providers offer dedicated host租赁:
- AWS Dedicated Hosts: Physical servers dedicated to your account
- GCP Sole-Tenant Nodes: Dedicated Compute Engine servers
- Latency impact: Eliminates hypervisor jitter, provides consistent CPU frequency
Cross-Region Connectivity
For firms trading across multiple exchanges or geographies:
- AWS Direct Connect: Dedicated network connection from your data centre to AWS
- GCP Cloud Interconnect: Similar dedicated connectivity
- Latency: Sub-millisecond within region, 1-10ms cross-region depending on distance
Real-World Latency Numbers
From production deployments at capital markets firms:
| Path | On-Premise | Cloud (Optimised) | Cloud (Standard) |
|---|---|---|---|
| Market data decode | 3-8 μs | 1-4 μs | 15-30 μs |
| Trading decision | 10-50 μs | 10-50 μs | 10-50 μs |
| Order encode + send | 5-15 μs | 2-6 μs | 20-40 μs |
| Total software | 18-73 μs | 13-60 μs | 45-120 μs |
| Wire to exchange | 30-80 μs | 200-500 μs | 200-500 μs |
| Total | 48-153 μs | 213-560 μs | 245-620 μs |
Optimised cloud trading systems achieve 2-4x the latency of on-premise systems — not 10-100x as commonly assumed.
Cost Considerations
Cloud trading infrastructure costs more per unit of compute than on-premise, but the total cost of ownership depends on how you use it:
- Capital expenditure vs operational expenditure: On-premise requires upfront hardware investment. Cloud converts this to monthly operational costs.
- Elastic capacity: Trading volume is spiky. Cloud lets you scale compute during market hours and reduce it overnight.
- Development velocity: Cloud-native tooling (infrastructure as code, automated provisioning) accelerates development cycles.
- Total cost: For a typical trading desk (5-10 trading engines), cloud costs 20-40% more than on-premise, but development velocity improvements offset this within 12-18 months.
What You Can Actually Use Today
- AWS: EC2 Trn1/Inf2 for FPGA workloads, placement groups for latency optimisation, Enhanced Networking with DPDK
- GCP: A3 instances for high-performance computing, gVNIC for enhanced networking, Sole-Tenant Nodes for dedicated hardware
- FPGA: Xilinx Alveo U50/U280 for market data processing, Intel Agilex for custom trading logic
- Networking: DPDK for kernel bypass, io_uring for async I/O, Solarflare OpenOnload for TCP acceleration
FAQ
Can cloud trading systems match on-premise latency?
No — cloud systems are 2-4x slower than on-premise for the most latency-sensitive paths. However, this gap has narrowed from 10-100x to 2-4x over the past five years, and most trading strategies do not require the absolute lowest latency.
Which cloud provider is best for trading workloads?
AWS has the most mature FPGA support and the largest capital markets customer base. GCP offers better network performance in some regions. Both are suitable — the choice depends on your existing infrastructure and team expertise.
Is cloud trading cost-effective?
For new trading desks and algorithmic strategies, yes. For established desks with existing on-premise infrastructure, migration economics depend on your specific cost structure. We typically see break-even at 12-18 months for greenfield deployments.
We help capital markets firms design and deploy low-latency trading infrastructure on AWS and GCP. If you are evaluating cloud for trading workloads or need to optimise existing cloud-based trading systems, get in touch.