Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are seeking a skilled and experienced Lead Site Reliability Engineer to join our dynamic team, ensuring the performance, scalability, and reliability of our production systems and infrastructure.
If you're a proactive problem-solver with a strong background in monitoring, automation, and cloud technologies, we want to hear from you.
Responsibilities
- Provide L3 on-call support as needed
- Design and develop monitoring systems for infrastructure and products
- Define and implement SLI/SLOs for system reliability tracking
- Conduct thorough root cause analyses for incidents
- Lead postmortem procedures and drills for continuous improvement
- Analyze product performance, scalability, and reliability
- Automate operational tasks to enhance efficiency
- Implement and manage CI/CD pipelines following "as-Code" practices
- Oversee cloud infrastructure and configuration management using Infrastructure-as-Code principles
- Collaborate closely with cross-product teams and business stakeholders to align reliability objectives
Requirements
- 5+ years of relevant experience, including 1 year in a leadership role
- Advanced knowledge of scripting languages such as Python, Go, Bash, or Powershell
- Expertise in any major cloud platform (AWS, GCP, or Azure)
- Proficient in optimizing monitoring and logging tools like DataDog, Dynatrace, Prometheus, Grafana, Zabbix, or ELK
- Capability to manage cloud infrastructure using tools like Terraform and command-line interfaces (gcloud, az, aws)
- Competency in configuration management using Ansible
- Background in CI/CD toolchains such as Jenkins (Groovy SDK, Jenkinsfile), GitLab-CI, or Azure DevOps
- Understanding of containerization technologies such as Docker and Kubernetes
- Exceptional troubleshooting and problem-solving abilities, including reconstructing incident conditions and flows based on root cause analysis
- B2-level English proficiency, both in speaking and writing
Nice to have
- Familiarity with multiple cloud-native monitoring tools
- Showcase of leading cross-functional team collaborations
- Proficiency in advanced Kubernetes configurations
We offer
- With us you can:
- Work on a flexible schedule remotely or from any of our comfortable offices or coworking spaces in Ukraine
- Receive the necessary equipment to perform your work tasks
- Change projects and technology stacks within EPAM
- Gain experience in various business domains (Insurance, E-commerce, Healthcare, Finance, Travelling, Media, Artificial Intelligence, and more)
- Relocation opportunities may be available for eligible candidates, depending on the role and openings at other EPAM locations
- Participate in volunteer, charity programs and communities (both technical and interest-based)
- We focus on your professional growth:
- You can plan your individual career path together with your manager
- Receive regular feedback from colleagues
- Improve your English for free with certified teachers (Speaking Clubs, client interview preparation courses, etc.)
- Get the opportunity to undergo free training and certification in AWS, GCP, or Azure Clouds
- Use the internal E-learn training program (18,200+ specialized training and mentoring programs)
- Access corporate accounts on LinkedIn Learning, Get Abstract and other partner resources
- Study at EPAM Solution Architecture School with the instructors who are practicing architects
- Develop as a leader, join Delivery Management, Resource Management, Leadership Essentials school and more
- Participate in internal communities (500+ meetups, technical discussions, brainstorming sessions, online events and conferences annually)
- What we offer:
- Vacation and sick leave (including a sick leave without a medical certificate)
- A wide range of Voluntary Medical Insurance programs providing both medical treatment and various preventive options (including sports activities)
- Medical insurance for family members at corporate rates
- Company support during significant life events (childbirth or adoption, marriage, etc.)
- Support for psychological comfort: discounts on services from mental health specialists or coaches, thematic training
- E-kids program - a free programming language training program for EPAMers' children
Kindly note that this role supports remote work, but only from within Ukraine.
Kindly be advised that the set of benefits, including learning, certification, and other opportunities, may vary depending on the role you apply for. Our recruiter will be able to share more details about the specific opportunity during your general interview.
EPAM strives to provide its global team of over 61,700 professionals in more than 55 countries with opportunities for professional growth from day one of collaboration. Our colleagues are the source of EPAM's success, so we value cooperation, strive to always understand our clients' business and aim for the highest quality standards. No matter where you are, you will join a dedicated, diverse community that will help you realize your potential to the fullest.
Key Skills
Ranked by relevanceReady to apply?
Join EPAM Systems and take your career to the next level!
Application takes less than 5 minutes

