-
View all jobs
We are seeking a highly skilled Lead Site Reliability Engineer to join our team in driving system reliability, scalability, and performance in complex cloud and containerized environments.
This is a unique opportunity to lead critical infrastructure initiatives, foster operational excellence, and collaborate across teams to achieve business objectives.
Responsibilities
The remote option applies only to the Candidates who will be working from any location in Ukraine.
This is a unique opportunity to lead critical infrastructure initiatives, foster operational excellence, and collaborate across teams to achieve business objectives.
Responsibilities
- Design comprehensive monitoring and logging systems using tools like DataDog, Dynatrace, Prometheus, Grafana, Zabbix, and ELK to ensure robust observability
- Define and manage SLIs and SLOs to measure and enhance system performance, reliability, and scalability
- Lead root cause analysis during incident responses, ensure detailed postmortem evaluations, and develop long-term preventive strategies
- Implement infrastructure as code (IaC) using Terraform and cloud CLI (AWS, Azure, GCP) for streamlined management and consistency
- Automate workflows and CI/CD pipelines leveraging tools such as Jenkins (Groovy SDK), GitLab CI, and Azure DevOps
- Manage containerized environments with expertise in Docker and Kubernetes orchestration for seamless application deployment
- Collaborate with engineering and DevOps teams to standardize observability practices and proactively address issues before they escalate
- Lead and facilitate post-incident reviews and operational drilling exercises to identify areas for improvement and increase system resilience
- Focus optional on-call support hours for rapid issue resolution and the maintenance of system stability
- Residence in Ukraine, with remote work eligibility limited to candidates based within the country
- Advanced proficiency in scripting automations with Python, Go, Bash, or PowerShell
- Strong knowledge of monitoring systems and tools like Prometheus, Grafana, DataDog, Dynatrace, Zabbix, or ELK
- Experience with cloud platforms (AWS, Azure, or GCP) and expertise in IaC with Terraform
- Solid understanding of configuration management systems like Ansible
- Background in automating CI/CD pipelines and delivery lifecycles using Jenkins, GitLab CI, and Azure DevOps
- Practical experience deploying and orchestrating applications in Docker and Kubernetes environments
- Exceptional problem-solving capability for incident reconstruction and identifying root causes
- Proven track record in leading post-incident reviews and operational improvement exercises
- Strong collaboration skills to work effectively with engineering teams and stakeholders to maintain reliability and performance
- English level B2 or higher
- Knowledge of advanced security and compliance strategies in observable environments
- Familiarity with chaos engineering approaches for resilience and fault tolerance testing
- Experience integrating observability into development workflows to accelerate issue resolution
- Familiarity with additional cloud monitoring services like AWS CloudWatch, Azure Monitor, or GCP Operations Suite
- With us you can:
- Work on a flexible schedule remotely or from any of our comfortable offices or coworking spaces in Ukraine
- Receive the necessary equipment to perform your work tasks
- Change projects and technology stacks within EPAM
- Gain experience in various business domains (Insurance, E-commerce, Healthcare, Finance, Travelling, Media, Artificial Intelligence, and more)
- Relocation opportunities may be available for eligible candidates, depending on the role and openings at other EPAM locations
- Participate in volunteer, charity programs and communities (both technical and interest-based)
- We focus on your professional growth:
- You can plan your individual career path together with your manager
- Receive regular feedback from colleagues
- Improve your English for free with certified teachers (Speaking Clubs, client interview preparation courses, etc.)
- Get the opportunity to undergo free training and certification in AWS, GCP, or Azure Clouds
- Use the internal E-learn training program (18,200+ specialized training and mentoring programs)
- Access corporate accounts on LinkedIn Learning, Get Abstract and other partner resources
- Study at EPAM Solution Architecture School with the instructors who are practicing architects
- Develop as a leader, join Delivery Management, Resource Management, Leadership Essentials school and more
- Participate in internal communities (500+ meetups, technical discussions, brainstorming sessions, online events and conferences annually)
- What we offer:
- Vacation and sick leave (including a sick leave without a medical certificate)
- A wide range of Voluntary Medical Insurance programs providing both medical treatment and various preventive options (including sports activities)
- Medical insurance for family members at corporate rates
- Company support during significant life events (childbirth or adoption, marriage, etc.)
- Support for psychological comfort: discounts on services from mental health specialists or coaches, thematic training
- E-kids program - a free programming language training program for EPAMers' children
- Kindly be advised that the set of benefits, including learning, certification, and other opportunities, may vary depending on the role you apply for. Our recruiter will be able to share more details about the specific opportunity during your general interview.
The remote option applies only to the Candidates who will be working from any location in Ukraine.
Key Skills
Ranked by relevance
cloud
aws
gcp
kubernetes
prometheus
gitlab ci
jenkins
grafana
datadog
docker
gitlab
cicd
configuration management
artificial intelligence
infrastructure as code
fault tolerance
terraform
python
devops
groovy
bash
elk
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Full-stack .NET Software Engineer (React/Angular)
2026-05-27
Full-time
Associate
Ukraine
Software Development
Information Technology
View Job Details
Related
Lead Generative AI Data Scientist
2026-05-24
Full-time
Mid-Senior
Ukraine
Software Development
Business Development
View Job Details
Related
Lead AI Engineer
2026-05-26
Full-time
Mid-Senior
Turkey
Software Development
Information Technology
Login to Apply
- Posted
- Aug 15, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Ukraine
- Company
- EPAM Systems
Industries
Software Development
IT Services
IT Consulting
Categories
Engineering
Information Technology
Business Development
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Full-stack .NET Software Engineer (React/Angular)
2026-05-27
Full-time
Associate
Ukraine
Software Development
Information Technology
View Job Details
Related
Lead Generative AI Data Scientist
2026-05-24
Full-time
Mid-Senior
Ukraine
Software Development
Business Development
View Job Details
Related
Lead AI Engineer
2026-05-26
Full-time
Mid-Senior
Turkey
Software Development
Information Technology