Founding ML infrastructure Engineer

Work from home Full-time role Hiring

The problem we saw Most AI infrastructure is built for batch: send a query, wait, get a response, reset. Powerful, but transactional. AI is becoming interactive — sessions that hold state, models that stay alive between turns, generation that responds as it runs — and the infrastructure to deliver that at scale doesn't really exist yet. The bottleneck isn't the models anymore. It's the infrastructure underneath them. What we're building to fix it uRun is the inference cloud for interactive AI: the compute layer that makes real-time, stateful inference possible at scale. We came out of stealth in April 2026, are backed by top-tier investors, and are founded by Keegan McCallum, who scaled inference infrastructure for some of the most demanding generative AI workloads in production. We're an infrastructure company. We build the layer that model labs, builders, and research teams ship on top of. Where you come in We are building the next generation of AI inference infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU compute platform from the ground up. This is a founding technical hire with end-to-end ownership across the full infrastructure stack, from bare metal to model serving. You will work directly with the founding team and define how we build. What you'll actually be doing day-to-day Design and scale our GPU compute platform to support 1,000+ GPU clusters, ensuring high availability and low-latency inference across the fleet Build and maintain the infrastructure layer for our compute marketplace, including multi-tenant scheduling, isolation, and billing-aware resource allocation Own production reliability for ML systems end-to-end: observability, incident response, and SLA achievement across model serving and infrastructure Architect feature stores and model registry systems that support rapid iteration and reproducibility at scale Design an experiment tracking infrastructure capable of handling thousands of concurrent runs with full auditability Build resource orchestration and scheduling systems that optimise for throughput, cost, and latency across heterogeneous hardware Set engineering standards for infrastructure reliability, capacity planning, and operational excellence as an early technical leader What skills you need for the journey Proven experience designing and operating large-scale distributed infrastructure at 1,000+ nodes or equivalent complexity, in any domain Deep expertise in distributed systems, cluster orchestration (Kubernetes, Slurm, or custom schedulers), and large-scale resource scheduling Strong production reliability instincts: observability, incident response, capacity planning, and SLA ownership across complex systems Experience building infrastructure that other engineers build on top of, not just operating it Ability to operate as a technical lead: set direction, make tradeoffs under uncertainty, and raise the bar for the team around you Startup orientation. You are energised by ambiguity, move fast, and build for scale from day one Things that will give you an edge Exposure to ML infrastructure concepts: GPU networking (NCCL, InfiniBand, RoCE), model serving frameworks (vLLM, SGLang, TensorRT-LLM), or hardware-aware performance tuning (CuTe, Triton, TileLang) Experience with multi-cloud GPU procurement and capacity management across AWS, GCP, Azure, and bare metal providers Familiarity with inference marketplace architectures, dynamic routing, or spot/preemptible workload management Prior experience at a Series A or earlier stage company scaling from early infrastructure to production What you'll get in return Competitive salary and meaningful equity in an early-stage AI infrastructure company. The band above is our target; for an exceptional candidate we'll go higher. Equity is real — you're early, and the grant reflects that. Health, dental, and vision — full coverage 401(k) — company-supported retirement savings FSA/HSA — flexible spending accounts for healthcare costs Paid time off — we trust you to manage your time Top-tier tooling — access to the best AI tools available: Claude, Codex, Kimi, and whatever else helps you move faster MacBook Pro and AirPods — the hardware you need, on us How we work (and what that feels like day-to-day) We build the stage, not the show. We're an infrastructure company, a developer-tools company, and a production partner for model labs — and focus is a deliberate choice we've made and hold to. Day-to-day, that means a small team, a high bar, and real ownership. You won't wait for permission or inherit a backlog of someone else's decisions. In a founding infrastructure role, the function is what you make it. It also means ambiguity: priorities shift, not everything is documented, and you'll often be the person who decides what "good enough for now" means. That suits some people and not others, and we'd rather you know that before you apply. Watch our launch party video Read the manifesto Follow us on LinkedIn Follow us on X Apply To This Job

Apply

Founding ML infrastructure Engineer

You might like

Customer Success Manager

Engineering Manager

Head of Threat Research

Backend Engineer

Deployment Strategist - Chicago

Founding Security Engineer / Head of Security

Vice President National Accounts

TikTok Shop Affiliate Specialist

Executive Assistant

Enterprise Account Executive

Executive Director, Regulatory Affairs

Experienced Beginner Level Chat Operator – Remote Customer Support and Social Media Engagement

Experienced Entry-Level Remote Data Entry Specialist for Teens: Flexible Opportunities at arenaflex

Experienced Customer Care Representative – Remote Work Opportunity with arenaflex

Remote Quantitative Analyst (Finance) - 75403

Experienced Data Entry Clerk – Entry Level Position at arenaflex

Vice President, Marketing Technology – Digital Marketing

Experienced Entry-Level Life Insurance Agent | Customer Service Representative (Remote) in Aurora, IL

Sr Enterprise Data Architect

Email & Social Marketing Manager