Lead Inference Platform Support Engineer - AI I

Company

PowerToFly

Location

toronto, Canada

Type

Full-time

About the Role As a Lead Inference Platform Engineer, you will: 
Optimize LLMs and ML models for high-performance inference using techniques such as quantization, pruning, distillation, and hardware specific tuning 
Deploy and scale inference workloads on GPUs across AWS, Azure, GCP and internal Kubernetes clusters, ensuring predictable performance during peak traffic hours, especially during business hours 
Implement routing and failover strategies for OpenAI/Anthropic/Vertex AI traffic 
Integrate models into production grade APIs supporting TR products and enterprise workflows. 
Develop highly optimized environment and eliminate performance bottlenecks to reduce latency 
Collaborate with Platform Engineering teams (Landing Zones, Network, Storage, Compute, AI) to ensure inference workloads align with TR’s cloud native patterns (AWS, Azure, GCP, OCI) 
Build and optimize containerized inference pipelines...
        

★ SearchEuropeanJobs.com

Lead Inference Platform Support Engineer - AI I

About the Role

★ Ready to Start Your European Career?