-
View all jobs
About The Project
We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia/Trainium while getting performance boost and cost reduction.
Current Infrastructure:
Duration: 2 months (preliminary)
Capacity: part-time (20h/week)
Areas of Responsibility
We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia/Trainium while getting performance boost and cost reduction.
Current Infrastructure:
- ML Models: RetinaFace, OpenPose, CLIP, and other CV models
- Hardware: A10/T4 GPUs on EKS
- Serving: Triton Inference Server
- Orchestration: Mix of Kubernetes and Ray
Duration: 2 months (preliminary)
Capacity: part-time (20h/week)
Areas of Responsibility
- Technical Leadership:
- Lead the architecture design for ML infrastructure modernization
- Define compilation and optimization strategies for model migration
- Establish performance benchmarking framework
- Set up monitoring and alerting for the new infrastructure
- Performance Optimization:
- Implement efficient model compilation pipelines for Inferentia2
- Optimize batch processing and memory layouts
- Fine-tune model serving configurations
- Ensure latency requirements are met across all services
- Cost Optimization:
- Analyze and optimize infrastructure costs
- Implement efficient resource allocation strategies
- Set up cost monitoring and reporting
- Achieve target cost reduction while maintaining performance
- Proven track record of ML infrastructure optimization projects
- Hands-on experience with AWS Neuron SDK and Inferentia/Trainium deployment
- Deep expertise in PyTorch model optimization and compilation
- Experience with high-throughput computer vision model serving
- Production experience with both Kubernetes and Ray for ML workloads
- Model Optimization Expertise:
- Deep understanding of ML model architecture optimization
- Experience with model compilation techniques for specialized hardware (Inferentia/Trainium)
- Proficiency in optimizing computer vision models (CNN architectures)
- Knowledge of model serving optimization patterns
- Performance Optimization:
- Advanced understanding of ML model inference optimization
- Expertise in batch processing strategies
- Memory layout optimization for vision models
- Experience with pipeline parallelism implementation
- Proficiency in latency/throughput optimization techniques
- Hardware Acceleration:
- Deep knowledge of ML accelerator architectures
- Understanding of hardware-specific optimizations
- Experience with model compilation for specialized chips
- Proficiency in memory access pattern optimization
- Performance Analysis:
- Proficiency in ML model profiling tools
- Experience with performance bottleneck identification
- Knowledge of performance monitoring techniques
- Ability to analyze and optimize inference patterns
- Experience with Ray architecture for ML serving
- Knowledge of distributed ML systems
- Understanding of ML pipeline optimization
- Experience with model quantization techniques
- Model Optimization (4+ years):
- Proven track record of optimizing large-scale ML inference systems
- Successfully implemented hardware-specific model optimizations
- Demonstrated experience with computer vision model optimization
- Led projects achieving significant performance improvements
- Proven Results (Examples):
- Successfully optimized computer vision models similar to RetinaFace/CLIP
- Achieved significant cost reduction while maintaining performance
- Implemented efficient batch processing strategies
- Developed performance monitoring and optimization frameworks
Key Skills
Ranked by relevance
computer vision
kubernetes
aws
pytorch
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
GCP Cloud Engineer
2025-12-02
Full-time
Entry
Estonia
IT Services
Engineering
View Job Details
Related
AI Cloud Solution Architect & Engineer
2025-11-28
Part-time
Mid-Senior
Lithuania
IT Services
Engineering
View Job Details
Related
Senior Embedded Machine Learning Engineer (C++)
2026-05-28
Full-time
Mid-Senior
Finland
Software Development
Information Technology
Login to Apply
- Posted
- Jan 23, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Finland
- Company
- Neurons Lab
Industries
IT Services
IT Consulting
Categories
Engineering
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
GCP Cloud Engineer
2025-12-02
Full-time
Entry
Estonia
IT Services
Engineering
View Job Details
Related
AI Cloud Solution Architect & Engineer
2025-11-28
Part-time
Mid-Senior
Lithuania
IT Services
Engineering
View Job Details
Related
Senior Embedded Machine Learning Engineer (C++)
2026-05-28
Full-time
Mid-Senior
Finland
Software Development
Information Technology