-
EPAM Systems

DevOps Engineer

EPAM Systems
Argentina · Full-time · Associate

We are hiring a Middle DevOps Engineer to run Kubernetes GPU orchestration with Volcano and keep Linux compute platforms stable for AI and research teams. You will automate day-to-day operations with Python and UNIX shell scripting, tune scheduling and quotas, and work in a client-facing delivery setup. Apply now to help build efficient, dependable compute infrastructure

 

Responsibilities

  • Provision and support GPU-capable Kubernetes clusters plus independent Linux compute nodes to maximize scheduling effectiveness and system performance
  • Operate Volcano scheduling by configuring queues, controlling POD lifecycle, allocating GPU resources, and applying namespace quota controls
  • Maintain Kubernetes environments by managing namespaces, RBAC, resource quotas, and workload isolation mechanisms
  • Automate operational workflows by writing and updating Python and Shell scripts for job submission, resource allocation, and monitoring
  • Partner with orchestration, optimization, and observability teams to improve scheduling performance, utilization, and researcher outcomes
  • Analyze and report on infrastructure health and resource usage to drive continuous optimization
  • Implement upgrades to infrastructure, tooling, and automation to improve scalability, performance, and user experience
  • Assist with operational processes that ensure researchers have an effective environment for AI and computational projects

Requirements

  • Hands-on background of 2+ years in DevOps or infrastructure engineering for complex, large-scale environments
  • Strong knowledge of Kubernetes operations, including namespaces, POD placement and balancing, PVC, NFS, and resource quota management
  • Practical experience operating Volcano for GPU workloads, including queue creation, priority handling, and Kubernetes integration
  • Demonstrated experience managing GPU clusters across Kubernetes and standalone Linux setups used for high-performance computing
  • Advanced ability in Python scripting to automate infrastructure tasks, job processing, and monitoring workflows
  • Solid command of UNIX Shell scripting (Bash or similar) to automate system routines and improve operations
  • Strong Linux administration skills with troubleshooting, performance tuning, and configuration management experience
  • Deep understanding of automation and orchestration concepts and tools for reliable, scalable infrastructure
  • Excellent English communication skills (spoken and written) for direct interaction with clients and cross-functional teams

Nice to have

  • Helm experience for Kubernetes application packaging and releases
  • Observability knowledge with Prometheus, Grafana, and Loki for infrastructure monitoring
  • Terraform familiarity for Infrastructure as Code and cloud resource automation
  • Experience with Amazon EKS and Google GKE in multi-cloud Kubernetes setups
  • Azure networking skills including VPN, ExpressRoute, and network security
  • Use of AI coding assistants such as GitHub Copilot, ChatGPT, and Claude to boost code quality and productivity
  • Knowledge of hybrid scheduling and optimization across cloud and on-premises compute

 

We offer

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Key Skills

Ranked by relevance

kubernetes linux python cloud ai shell scripting devops unix configuration management infrastructure as code prometheus grafana bash loki eks vpn
Login to Apply
Posted
May 27, 2026
Type
Full-time
Level
Associate
Location
Argentina

Industries

Software Development IT Services IT Consulting Technology Information Internet

Categories

Engineering Information Technology Business Development

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
EPAM Systems
Related

DevOps Engineer (AWS)

2026-05-27

Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
EPAM Systems
Related

Cloud & DevOps Trainee

2026-05-27

Internship
Internship
Ukraine
Software Development
Engineering
View Job Details
EPAM Systems
Related

Cloud & DevOps Trainee

2026-05-27

Internship
Internship
Ukraine
Software Development
Engineering