-
View all jobs
We’re seeking a skilled DevOps/SRE with extensive expertise in designing, implementing, and maintaining observability platforms to ensure system reliability, performance, and scalability. As a vital member of our SRE team, you will promote the adoption of observability best practices, fostering proactive monitoring, swift incident resolution, and continuous enhancements to our software products and infrastructure.
This role emphasizes creating and refining observability solutions—including metrics, logs, and traces—to provide actionable insights into system health and performance. You'll also advance automation for deployment pipelines, oversee applications across various environments, and ensure our systems meet rigorous reliability and availability expectations. Collaboration will be essential as you engage closely with development teams to integrate observability into the software lifecycle, equipping them with the tools and practices for efficient debugging and iteration.
Responsibilities
EPAM strives to provide its global team of over 52,800+ professionals in more than 55 countries with opportunities for professional growth from day one of collaboration. Our colleagues are the source of EPAM's success, so we value cooperation, strive to always understand our clients' business and aim for the highest quality standards. No matter where you are, you will join a dedicated, diverse community that will help you realize your potential to the fullest.
This role emphasizes creating and refining observability solutions—including metrics, logs, and traces—to provide actionable insights into system health and performance. You'll also advance automation for deployment pipelines, oversee applications across various environments, and ensure our systems meet rigorous reliability and availability expectations. Collaboration will be essential as you engage closely with development teams to integrate observability into the software lifecycle, equipping them with the tools and practices for efficient debugging and iteration.
Responsibilities
- Architect and implement observability platforms using tools like Prometheus, Grafana, and OpenTelemetry to support our Next.js frontend and accompanying systems
- Design and maintain automated deployment pipelines focused on reliability, observability, and zero-downtime updates across multiple environments
- Collaborate with development teams to integrate observability into local workflows for accelerated debugging and iteration
- Optimize infrastructure and tools for scalability, fault tolerance, and performance with the aim of reducing mean time to detection (MTTD) and resolution (MTTR)
- Mentor team members in SRE practices, including observability-driven development, incident management, and post-mortem analyses
- Proficiency in scripting languages like Python for automation and observability tools
- Expertise in observability frameworks (e.g., Prometheus, Grafana, Loki, Jaeger) and logging solutions (e.g., ELK stack, Fluentd)
- Background in containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes, AWS ECS)
- Knowledge of infrastructure as code tools (e.g., Terraform, Ansible) to provision and manage observable systems
- Familiarity with version control systems, especially Git, and integrating observability into CI/CD pipelines (e.g., Jenkins, GitHub Actions)
- Capability to define and measure service-level indicators (SLIs), objectives (SLOs), and error budgets to ensure system reliability
- Competency in fostering collaboration and communication, with a strong commitment to nurturing a blameless culture of improvement
- Proficiency in Polish language
- Proficiency in programming languages as applied to SRE, DEVOPS, or observability contexts
- Familiarity with cloud platforms, such as AWS, with a focus on observability services (e.g., CloudWatch, X-Ray)
- Understanding of distributed systems, chaos engineering, or security practices in observable environments
- Work on a flexible schedule remotely or from any of our comfortable offices or coworking spaces in Ukraine
- Receive the necessary equipment to perform your work tasks
- Change projects and technology stacks within EPAM
- Gain experience in various business domains (Insurance, E-commerce, Healthcare, Finance, Travelling, Media, Artificial Intelligence, and more)
- Consider relocation options in over 30 countries worldwide
- Participate in volunteer, charity programs and communities (both technical and interest-based)
- You can plan your individual career path together with your manager.
- Receive regular feedback from colleagues
- Improve your English for free with certified teachers (Speaking Clubs, client interview preparation courses, etc.)
- Get the opportunity to undergo free training and certification in AWS, GCP, or Azure Clouds
- Use the internal E-learn training program (18,200+ specialized training and mentoring programs)
- Access corporate accounts on LinkedIn Learning, Get Abstract and other partner resources
- Study at EPAM Solution Architecture School with the instructors who are practicing architects
- Develop as a leader, join Delivery Management, Resource Management, Leadership Essentials school and more
- Participate in internal communities (500+ meetups, technical discussions, brainstorming sessions, online events and conferences annually)
- Vacation and sick leave (including a sick leave without a medical certificate)
- A wide range of Voluntary Medical Insurance programs providing both medical treatment and various preventive options (including sports activities)
- Medical insurance for family members at corporate rates
- Company support during significant life events (childbirth or adoption, marriage, etc.)
- Support for psychological comfort: discounts on services from mental health specialists or coaches, thematic training
- E-kids program - a free programming language training program for EPAMers' children
EPAM strives to provide its global team of over 52,800+ professionals in more than 55 countries with opportunities for professional growth from day one of collaboration. Our colleagues are the source of EPAM's success, so we value cooperation, strive to always understand our clients' business and aim for the highest quality standards. No matter where you are, you will join a dedicated, diverse community that will help you realize your potential to the fullest.
Key Skills
Ranked by relevance
aws
prometheus
grafana
artificial intelligence
infrastructure as code
security practices
containerization
fault tolerance
kubernetes
terraform
jenkins
ansible
python
docker
devops
swift
cloud
loki
cicd
git
gcp
elk
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
DevOps Engineer
2026-05-27
Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
Related
Full-stack .NET Software Engineer (React/Angular)
2026-05-27
Full-time
Associate
Ukraine
Software Development
Information Technology
View Job Details
Related
Senior Python Developer with LLM, AI skills
2026-05-21
Full-time
Mid-Senior
Ukraine
Software Development
Information Technology
Login to Apply
- Posted
- May 13, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Ukraine
- Company
- EPAM Systems
Industries
Software Development
IT Services
IT Consulting
Categories
Engineering
Information Technology
Business Development
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
DevOps Engineer
2026-05-27
Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
Related
Full-stack .NET Software Engineer (React/Angular)
2026-05-27
Full-time
Associate
Ukraine
Software Development
Information Technology
View Job Details
Related
Senior Python Developer with LLM, AI skills
2026-05-21
Full-time
Mid-Senior
Ukraine
Software Development
Information Technology