-
EPAM Systems

Lead Generative AI Operations Engineer

EPAM Systems
Portugal · Full-time · Mid-Senior

We are seeking a Lead Generative AI Operations Engineer to architect and sustain a robust ML infrastructure that supports seamless AI deployment.

In this role, you will work cross-functionally to develop scalable MLOps pipelines and infrastructure, enabling data scientists and engineers to transition ML projects from prototype stages to production environments. Join us to make a significant impact on AI services within the IT Chief Data Office.

 

Responsibilities

  • Design scalable AI and machine learning workloads that align with company objectives
  • Develop and uphold reproducible machine learning pipelines
  • Deploy AI models into production using model serving infrastructures
  • Implement monitoring and logging frameworks for AI service observability
  • Define infrastructure needs for MLOps pipelines and related components
  • Collaborate with infrastructure engineers to facilitate infrastructure deployment
  • Guide and mentor team members to encourage best practices and ongoing improvement
  • Coordinate efforts with cross-functional teams including data scientists and engineers
  • Optimize machine learning workloads for enhanced performance and scalability
  • Ensure adherence to security protocols and data privacy regulations
  • Assess new tools and technologies to improve AI service delivery
  • Document system designs and workflows for knowledge dissemination
  • Diagnose and resolve production issues affecting AI services

 

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related discipline
  • Over 5 years of experience in AI, machine learning, data engineering, software development, or cloud infrastructure
  • Strong expertise in Python and proficiency with AI/ML frameworks such as PyTorch, TensorFlow, HuggingFace, or Scikit-learn
  • Experience with model inference runtimes including vLLM, MLServe, or Torch Serve
  • Proficiency in containerization and orchestration technologies such as Docker and Kubernetes
  • Experience specifying and implementing infrastructure requirements for ML pipelines
  • Strong analytical and problem-solving capabilities with experience in agile cross-disciplinary teams
  • Effective communication and mentoring abilities to support team growth
  • English language proficiency at B2 level or higher

 

Nice to have

  • Familiarity with cloud platforms like Azure, AWS, or Google Cloud
  • Understanding of Infrastructure as Code (IaC) methodologies
  • Experience with experiment tracking systems and pipeline orchestration tools

 

We offer

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Key Skills

Ranked by relevance

ai machine learning cloud mlops infrastructure as code containerization tensorflow pytorch python docker aws
Login to Apply
Posted
Dec 04, 2025
Type
Full-time
Level
Mid-Senior
Location
Portugal

Industries

Software Development IT Services IT Consulting Technology Information Internet

Categories

Engineering Information Technology Research

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
EPAM Systems
Related

DevOps Engineer

2026-05-27

Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
EPAM Systems
Related

Lead AI Engineer

2026-05-26

Full-time
Mid-Senior
Turkey
Software Development
Information Technology
View Job Details
EPAM Systems
Related

Lead DevOps Engineer (Azure)

2026-05-16

Full-time
Mid-Senior
Turkey
Software Development
Engineering