SearchEuropeanJobs.com

Lead Inference Platform Support Engineer - AI I

Company

PowerToFly

Location

toronto, Canada

Type

Full-time

About the Role

As a Lead Inference Platform Engineer, you will:

  • Optimize LLMs and ML models for high-performance inference using techniques such as quantization, pruning, distillation, and hardware specific tuning
  • Deploy and scale inference workloads on GPUs across AWS, Azure, GCP and internal Kubernetes clusters, ensuring predictable performance during peak traffic hours, especially during business hours
  • Implement routing and failover strategies for OpenAI/Anthropic/Vertex AI traffic
  • Integrate models into production grade APIs supporting TR products and enterprise workflows.
  • Develop highly optimized environment and eliminate performance bottlenecks to reduce latency
  • Collaborate with Platform Engineering teams (Landing Zones, Network, Storage, Compute, AI) to ensure inference workloads align with TR’s cloud native patterns (AWS, Azure, GCP, OCI)
  • Build and optimize containerized inference pipelines...

★ Ready to Start Your European Career?

Take the next step and apply for this exciting opportunity

Apply Now