Lead Generative AI Operations Engineer

EPAM SystemsPortugal10 days ago

Full-timeEngineering, Information Technology +1

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

We are seeking a Lead Generative AI Operations Engineer to architect and sustain a robust ML infrastructure that supports seamless AI deployment.

In this role, you will work cross-functionally to develop scalable MLOps pipelines and infrastructure, enabling data scientists and engineers to transition ML projects from prototype stages to production environments. Join us to make a significant impact on AI services within the IT Chief Data Office.

Responsibilities

Design scalable AI and machine learning workloads that align with company objectives
Develop and uphold reproducible machine learning pipelines
Deploy AI models into production using model serving infrastructures
Implement monitoring and logging frameworks for AI service observability
Define infrastructure needs for MLOps pipelines and related components
Collaborate with infrastructure engineers to facilitate infrastructure deployment
Guide and mentor team members to encourage best practices and ongoing improvement
Coordinate efforts with cross-functional teams including data scientists and engineers
Optimize machine learning workloads for enhanced performance and scalability
Ensure adherence to security protocols and data privacy regulations
Assess new tools and technologies to improve AI service delivery
Document system designs and workflows for knowledge dissemination
Diagnose and resolve production issues affecting AI services

Requirements

Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related discipline
Over 5 years of experience in AI, machine learning, data engineering, software development, or cloud infrastructure
Strong expertise in Python and proficiency with AI/ML frameworks such as PyTorch, TensorFlow, HuggingFace, or Scikit-learn
Experience with model inference runtimes including vLLM, MLServe, or Torch Serve
Proficiency in containerization and orchestration technologies such as Docker and Kubernetes
Experience specifying and implementing infrastructure requirements for ML pipelines
Strong analytical and problem-solving capabilities with experience in agile cross-disciplinary teams
Effective communication and mentoring abilities to support team growth
English language proficiency at B2 level or higher

Nice to have

Familiarity with cloud platforms like Azure, AWS, or Google Cloud
Understanding of Infrastructure as Code (IaC) methodologies
Experience with experiment tracking systems and pipeline orchestration tools

We offer

International projects with top brands
Work with global teams of highly skilled, diverse peers
Healthcare benefits
Employee financial programs
Paid time off and sick leave
Upskilling, reskilling and certification courses
Unlimited access to the LinkedIn Learning library and 22,000+ courses
Global career opportunities
Volunteer and community involvement opportunities
EPAM Employee Groups
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Key Skills

Ranked by relevance

Ready to apply?

Join EPAM Systems and take your career to the next level!

Application takes less than 5 minutes

Apply