EPAM Systems
Lead Site Reliability Engineer
EPAM SystemsArgentina23 hours ago
Full-timeEngineering, Information Technology +1

We are looking for an experienced Lead Site Reliability Engineer to join our team and drive the development of reliable and scalable infrastructure.

In this role, you will work closely with software and operations teams to ensure seamless integration between infrastructure and applications. You will be instrumental in maintaining high system reliability, optimizing scalability, and driving operational excellence using modern tools and technologies.

 

Responsibilities

  • Partner with software teams to ensure smooth integration of infrastructure and application systems
  • Implement SRE principles and engineering practices to build, monitor, and operate complex infrastructure solutions
  • Utilize automation tools to improve operational workflows and enhance system reliability
  • Architect and maintain scalable web systems and cloud-based platforms
  • Develop efficient and maintainable code using languages like Golang, Python, Ruby, and Scala
  • Diagnose and resolve issues under high-pressure situations, ensuring timely resolution
  • Monitor system performance and implement measures to guarantee consistent uptime and reliability

 

Requirements

  • At least 5 years of experience in developing, managing, or supporting large-scale Linux-based web application systems
  • Minimum of one year of experience leading and managing development teams
  • Proficiency in UNIX systems administration with expertise in scripting languages such as Python, PHP, or Bash
  • Practical experience running Docker with orchestration tools like Nomad, Kubernetes, or Amazon ECS
  • Familiarity with configuration management tools like Ansible, Chef, or Puppet (Puppet experience preferred)
  • Strong communication skills and ability to work effectively with distributed teams
  • Ability to produce clean, well-documented, and easy-to-understand systems and scripts
  • Eagerness to continuously learn and work with new technologies and programming languages
  • Fluent English communication skills, both written and verbal, at a B2+ level or higher

 

Nice to have

  • Knowledge of observability and performance monitoring tools such as ELK, Prometheus, New Relic, Sentry, or Lightstep
  • Proficiency in Ruby or Scala for development and scripting tasks

 

We offer

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Key Skills

Ranked by relevance