Kubernetes GPU Scheduling for Quantitative Research Workloads
Quantitative research teams consume GPU compute differently from standard ML teams. A single backtest of a reinforcement learning strategy may require 8 H100 GPUs for 72 hours, then nothing for a week. A risk model training run may consume 4 A100s for 6 hours, but the researcher needs interactive access to the dashboard throughout. Peak demand is unpredictable and hit-driven. We have built GPU infrastructure for quant hedge funds and bank research desks on Kubernetes. Here is what we learned about scheduling, sharing, and cost management for financial ML workloads. ...