SearchEuropeanJobs.com

Software Engineer (GPU Infrastructure, High Performance Computing)

Company

Cohere

Location

toronto, Canada

Type

Full-time

RequirementsDeep expertise in ML/HPC infrastructure: Experience with GPU/TPU clusters, distributed training frameworks (JAX, PyTorch, TensorFlow), and high-performance computing (HPC) environmentsKubernetes at scale: Proven ability to deploy, manage, and troubleshoot cloud-native Kubernetes clusters for AI workloadsStrong programming skills: Proficiency in Python (for ML tooling) and Go (for systems engineering), with a preference for open-source contributions over reinventing solutionsLow-level systems knowledge: Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloadsResearch collaboration experience: A track record of working closely with AI researchers or ML engineers to solve infrastructure challengesSelf-directed problem‑solving: The ability to identify bottlenecks, propose solutions, and drive impact in a fast‑paced environmentIf some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!What the job ...

★ Ready to Start Your European Career?

Take the next step and apply for this exciting opportunity

Apply Now