Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are seeking a Lead Generative AI Operations Engineer to architect and sustain a robust ML infrastructure that supports seamless AI deployment.
In this role, you will work cross-functionally to develop scalable MLOps pipelines and infrastructure, enabling data scientists and engineers to transition ML projects from prototype stages to production environments. Join us to make a significant impact on AI services within the IT Chief Data Office.
Responsibilities
- Design scalable AI and machine learning workloads that align with company objectives
- Develop and uphold reproducible machine learning pipelines
- Deploy AI models into production using model serving infrastructures
- Implement monitoring and logging frameworks for AI service observability
- Define infrastructure needs for MLOps pipelines and related components
- Collaborate with infrastructure engineers to facilitate infrastructure deployment
- Guide and mentor team members to encourage best practices and ongoing improvement
- Coordinate efforts with cross-functional teams including data scientists and engineers
- Optimize machine learning workloads for enhanced performance and scalability
- Ensure adherence to security protocols and data privacy regulations
- Assess new tools and technologies to improve AI service delivery
- Document system designs and workflows for knowledge dissemination
- Diagnose and resolve production issues affecting AI services
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related discipline
- Over 5 years of experience in AI, machine learning, data engineering, software development, or cloud infrastructure
- Strong expertise in Python and proficiency with AI/ML frameworks such as PyTorch, TensorFlow, HuggingFace, or Scikit-learn
- Experience with model inference runtimes including vLLM, MLServe, or Torch Serve
- Proficiency in containerization and orchestration technologies such as Docker and Kubernetes
- Experience specifying and implementing infrastructure requirements for ML pipelines
- Strong analytical and problem-solving capabilities with experience in agile cross-disciplinary teams
- Effective communication and mentoring abilities to support team growth
- English language proficiency at B2 level or higher
Nice to have
- Familiarity with cloud platforms like Azure, AWS, or Google Cloud
- Understanding of Infrastructure as Code (IaC) methodologies
- Experience with experiment tracking systems and pipeline orchestration tools
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
Key Skills
Ranked by relevanceReady to apply?
Join EPAM Systems and take your career to the next level!
Application takes less than 5 minutes

