-
EPAM Systems

DevOps Engineer

EPAM Systems
Brazil · Full-time · Associate

We are enabling dependable GPU compute by operating Kubernetes and Linux platforms focused on Volcano scheduling and automated infrastructure operations. As a Middle DevOps Engineer, you will manage Kubernetes administration, run GPU clusters on Kubernetes and Linux nodes, and create automation with Python and UNIX shell scripting for a client-facing delivery team. Apply to help deliver stable, efficient AI compute environments at scale.

 

Responsibilities

  • Provision, configure, and operate GPU-enabled Kubernetes clusters and standalone Linux compute environments to keep scheduling and performance optimized
  • Set up and administer Volcano job scheduling, including queue setup, POD execution, GPU allocation, and namespace quota enforcement
  • Own Kubernetes administration across namespaces, RBAC, resource quotas, and workload isolation approaches
  • Automate job submission, resource provisioning, and system reporting by creating and maintaining Python and Shell scripts
  • Coordinate with orchestration, optimization, and observability teams to raise scheduling efficiency, improve capacity utilization, and streamline researcher workflows
  • Observe infrastructure health and resource utilization, supplying data and feedback for optimization and reporting needs
  • Improve infrastructure, tooling, and automation workflows to increase performance, scalability, and usability
  • Maintain operational processes that provide a smooth and efficient experience for researchers running diverse AI and computational workloads

Requirements

  • Hands-on background with 2+ years of experience in DevOps or infrastructure engineering within complex, large-scale environments
  • Expertise in Kubernetes administration and orchestration, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management
  • Practical experience with the Volcano scheduler for GPU job execution, queue configuration, and workload prioritization integrated with Kubernetes
  • Proven ability to operate GPU cluster environments in Kubernetes as well as on standalone Linux compute nodes
  • Advanced Python scripting skills for infrastructure automation, plus proficiency in UNIX Shell scripting such as Bash
  • Strong Linux system administration skills, including troubleshooting, performance tuning, and configuration management
  • Solid understanding of infrastructure automation and orchestration concepts and related tooling
  • Fluent English communication skills (spoken and written) for direct client interaction

Nice to have

  • Knowledge of Helm package management for Kubernetes applications
  • Familiarity with monitoring and observability solutions, particularly Prometheus, Grafana, and Loki
  • Skills in Infrastructure as Code tools such as Terraform
  • Background in multi-cloud Kubernetes environments including Amazon EKS and Google GKE
  • Understanding of Azure Networking including VPN, ExpressRoute, and network security
  • Familiarity with AI-assisted coding tools such as GitHub Copilot, ChatGPT, and Claude
  • Experience with hybrid (cloud and on-premises) scheduling and resource optimization

 

We offer

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Key Skills

Ranked by relevance

kubernetes linux python ai shell scripting devops cloud unix infrastructure as code system administration prometheus grafana eks vpn
Login to Apply
Posted
Apr 02, 2026
Type
Full-time
Level
Associate
Location
Brazil

Industries

Software Development IT Services IT Consulting Technology Information Internet

Categories

Engineering Information Technology Business Development

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
EPAM Systems
Related

Senior GoLang Developer

2026-04-09

Full-time
Mid-Senior
Argentina
Software Development
Information Technology
View Job Details
EPAM Systems
Related

SAP Logistics (SCM) Engineering Manager

2026-04-07

Full-time
Associate
Slovenia
Software Development
Business Development
View Job Details
EPAM Systems
Related

Senior AEM Back End Developer

2026-04-08

Full-time
Mid-Senior
Argentina
Software Development
Information Technology