-
View all jobs
We are looking for an experienced Senior Site Reliability Engineer (SRE) to join our Platform Engineering team and enhance the reliability, scalability, and observability of our systems.
You will work closely with cross-functional teams to implement best practices, improve Developer Experience, and ensure compliance with critical SLM and performance metrics. Your role will involve optimizing infrastructure, creating automation solutions, and collaborating on event-driven architectures using tools like Terraform, Kubernetes, AWS, Kafka, and New Relic.
Responsibilities
You will work closely with cross-functional teams to implement best practices, improve Developer Experience, and ensure compliance with critical SLM and performance metrics. Your role will involve optimizing infrastructure, creating automation solutions, and collaborating on event-driven architectures using tools like Terraform, Kubernetes, AWS, Kafka, and New Relic.
Responsibilities
- Design and implement scalable and highly available systems using techniques such as load balancing, canary releases, blue-green deployments, and auto-scaling
- Develop and maintain monitoring, logging, and observability dashboards using tools like New Relic, Prometheus, Grafana, and Datadog
- Assist teams in determining appropriate settings and thresholds for alerts and automation, accounting for variance in application performance requirements
- Ensure compliance with SLAs, SLOs, SLIs, and DORA metrics by monitoring system performance and tracking targets such as uptime, response times, and incident resolution times
- Advocate for system resiliency by implementing and promoting "Chaos" engineering practices
- Collaborate with cross-functional teams to enhance platform engineering practices and guide the adoption of improved tooling and metrics analysis
- Analyze system performance and reliability metrics to drive data-informed improvements in platform infrastructure
- Improve performance and scalability of event-driven architectures using tools like Kafka
- Manage cloud infrastructure solutions across AWS, Azure, or GCP in line with business needs
- 5+ years of experience with Infrastructure-as-Code tooling such as Terraform
- Extensive knowledge of DevOps metrics like DORA (e.g., deployment frequency, change failure rates) and Service Level Management (SLAs, SLOs, SLIs)
- Expertise in monitoring and observability tools such as New Relic, Prometheus, Grafana, or Datadog
- Strong experience in designing scalable architectures with load balancing, canary releases, and auto-scaling methodologies
- Proficiency in working with cloud platforms such as AWS, Azure, or GCP
- Background in CI/CD pipelines using tools like GitHub Actions, Jenkins, or GitLab CI
- Experience with Kafka for real-time event-driven data processing and performance improvement
- Understanding of SLM tooling and metrics platforms, such as Apache DevLake, Grafana, and New Relic
- Familiarity with Observability-as-Code practices and tooling
- Background in implementing "Chaos" engineering practices to validate system resiliency
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Healthcare benefits
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
Key Skills
Ranked by relevance
grafana
aws
prometheus
kafka
cloud
kubernetes
terraform
jenkins
devops
apache
gitlab
cicd
gcp
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
DevOps Engineer
2026-05-27
Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
Related
Full-stack .NET Software Engineer (React/Angular)
2026-05-27
Full-time
Associate
Ukraine
Software Development
Information Technology
View Job Details
Related
DevOps Engineer (AWS)
2026-05-27
Full-time
Associate
Argentina
Software Development
Engineering
Login to Apply
- Posted
- Jun 06, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Ukraine
- Company
- EPAM Systems
Industries
Software Development
IT Services
IT Consulting
Professional Training
Coaching
Categories
Engineering
Information Technology
Business Development
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
DevOps Engineer
2026-05-27
Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
Related
Full-stack .NET Software Engineer (React/Angular)
2026-05-27
Full-time
Associate
Ukraine
Software Development
Information Technology
View Job Details
Related
DevOps Engineer (AWS)
2026-05-27
Full-time
Associate
Argentina
Software Development
Engineering