EPAM Systems
Senior Site Reliability Engineer (SRE)
EPAM SystemsArgentina5 days ago
Full-timeEngineering, Information Technology +1

We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) to join our dynamic team.

As a Senior SRE, you will play a critical role in designing, developing, and maintaining highly reliable systems and processes to ensure optimal performance and scalability of applications and infrastructure across diverse environments.

 

Responsibilities

  • Build and containerize applications and deploy them using open-source container management tools such as Docker or Podman
  • Design and maintain Kubernetes resource manifests, deploying them into clusters on platforms like AKS or GKE
  • Configure and deploy Prometheus agents to monitor infrastructure and application behaviors, raising alerts when necessary
  • Create and manage continuous deployment pipelines using tools like Helm and ArgoCD
  • Optimize observability by implementing monitoring, logging, and tracing solutions
  • Maintain and manage CI/CD processes within Azure DevOps or similar environments
  • Develop and implement solutions on cloud platforms, leveraging expertise in at least one provider (e.g., Microsoft Azure, GCP, AWS)
  • Troubleshoot infrastructural and application issues by utilizing logs and traces to isolate events effectively

 

Requirements

  • Minimum 3+ years of programming experience, preferably in GoLang
  • Hands-on experience with at least one scripting language (e.g., Bash or Python)
  • Proficiency with Kubernetes, with at least 3 years of practical expertise
  • Fundamental knowledge of observability tools, with a focus on Prometheus or similar monitoring platforms
  • Skills in configuring and managing CI/CD pipelines using Azure DevOps or tools like Helm and ArgoCD for GitOps-style continuous deployment
  • Background in cloud platforms with competency in at least one provider (e.g., Microsoft Azure, Google Cloud, AWS)
  • Flexibility to use open-source tools like Docker or Podman to containerize applications and manage their runtime environments effectively

 

Nice to have

  • Familiarity with multiple cloud providers, including AWS and GCP alongside Azure
  • Expertise in GitOps packaging and deployment tools like Argo CD and Helm
  • Understanding of service meshes like Istio for Kubernetes-based microservices architectures
  • Competency in infrastructure-as-code tools such as Terraform
  • Background in software development with experience across multiple domains

 

We offer

  • International projects with top brands
  • Work with global teams of highly skilled, diverse peers
  • Healthcare benefits
  • Employee financial programs
  • Paid time off and sick leave
  • Upskilling, reskilling and certification courses
  • Unlimited access to the LinkedIn Learning library and 22,000+ courses
  • Global career opportunities
  • Volunteer and community involvement opportunities
  • EPAM Employee Groups
  • Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Key Skills

Ranked by relevance