Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are seeking a Machine Learning Engineer to design, deploy, and maintain robust training pipelines and model serving workflows, ensuring high reliability and performance.
The ideal candidate will have expertise in machine learning development and deployment, with a focus on versioning, monitoring, and rollback mechanisms.
Responsibilities
- Reproduce training processes by pinning environments and seeds, and rebuild feature and label pipelines with precision
- Register models and deploy SageMaker batch endpoints with autoscaling capabilities
- Define rollback mechanisms and implement shadow and canary testing methodologies
- Set up and maintain Experiments, Model Registry, and Feature Store integration tools, enabling seamless CI/CD workflows
- Implement and monitor Model Monitor metrics such as drift and performance
- Author clear documentation and runbooks to support operational workflows
Requirements
- 2+ years of experience with Python 3.x and machine learning frameworks such as PyTorch or TensorFlow
- Skills in SageMaker technologies including Pipelines, Registry, Endpoints, and Model Monitor
- Proficiency in using MLflow or SageMaker Experiments for tracking models and pipelines
- Qualifications in Docker, coupled with Feature Store expertise such as SageMaker FS or Feast
- Strong written and verbal English communication skills (B2+)
Nice to have
- Background in FastAPI or BentoML to streamline deployment processes
- Familiarity with ONNX and Ray Serve or Triton for efficient model serving
- Showcase of distributed training solutions using Horovod or DeepSpeed
- Understanding of Hugging Face Transformers for advanced NLP applications
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
Key Skills
Ranked by relevanceReady to apply?
Join EPAM Systems and take your career to the next level!
Application takes less than 5 minutes