Our client is a fast-growing AI research and technology company building reasoning-first, agentic AI systems, with a footprint spanning the US and Asia. The team is behind several widely adopted open-source research agents that have posted top-tier results on industry benchmarks, and is led by scientific leadership with backgrounds spanning top US universities and frontier AI labs. Backed by a serial entrepreneur with a track record of building category-defining tech companies, the company is now scaling its compute infrastructure to support next-generation training and inference workloads at massive scale.
The RoleBuild and evolve the core infrastructure layer for large-scale AI training and inference on 10,000+ GPU clusters — Kubernetes scheduling, storage, networking, and reliability engineering that makes massive shared compute efficient, reliable, and easy to operate for research and engineering teams.
What You'll DoTake the next step and apply for this exciting opportunity
Apply Now