Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) to join our dynamic team.
As a Senior SRE, you will play a critical role in designing, developing, and maintaining highly reliable systems and processes to ensure optimal performance and scalability of applications and infrastructure across diverse environments.
Responsibilities
- Build and containerize applications and deploy them using open-source container management tools such as Docker or Podman
- Design and maintain Kubernetes resource manifests, deploying them into clusters on platforms like AKS or GKE
- Configure and deploy Prometheus agents to monitor infrastructure and application behaviors, raising alerts when necessary
- Create and manage continuous deployment pipelines using tools like Helm and ArgoCD
- Optimize observability by implementing monitoring, logging, and tracing solutions
- Maintain and manage CI/CD processes within Azure DevOps or similar environments
- Develop and implement solutions on cloud platforms, leveraging expertise in at least one provider (e.g., Microsoft Azure, GCP, AWS)
- Troubleshoot infrastructural and application issues by utilizing logs and traces to isolate events effectively
Requirements
- Minimum 3+ years of programming experience, preferably in GoLang
- Hands-on experience with at least one scripting language (e.g., Bash or Python)
- Proficiency with Kubernetes, with at least 3 years of practical expertise
- Fundamental knowledge of observability tools, with a focus on Prometheus or similar monitoring platforms
- Skills in configuring and managing CI/CD pipelines using Azure DevOps or tools like Helm and ArgoCD for GitOps-style continuous deployment
- Background in cloud platforms with competency in at least one provider (e.g., Microsoft Azure, Google Cloud, AWS)
- Flexibility to use open-source tools like Docker or Podman to containerize applications and manage their runtime environments effectively
Nice to have
- Familiarity with multiple cloud providers, including AWS and GCP alongside Azure
- Expertise in GitOps packaging and deployment tools like Argo CD and Helm
- Understanding of service meshes like Istio for Kubernetes-based microservices architectures
- Competency in infrastructure-as-code tools such as Terraform
- Background in software development with experience across multiple domains
We offer
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
Key Skills
Ranked by relevanceReady to apply?
Join EPAM Systems and take your career to the next level!
Application takes less than 5 minutes

