Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are looking for a skilled and adaptable Senior Site Reliability Engineer (SRE) to join our team, specializing in advanced 3rd line support for essential enterprise systems hosted on Azure. Your role will be critical in maintaining the reliability, performance, and availability of our cloud infrastructure through your expertise in Azure, DevOps, observability, and automation.
If you thrive in fast-paced environments and enjoy solving complex technical challenges, we’d love to hear from you.
Feel free to work remotely from anywhere across Lithuania or connect with colleagues at our Vilnius and Kaunas offices.
Responsibilities
- Lead advanced troubleshooting and incident management for cloud-based systems, ensuring rapid resolution and root-cause analysis
- Maintain and enhance system reliability, performance, and uptime across Azure environments
- Implement and optimize observability, monitoring, and logging solutions using Azure Monitor, Application Insights, Log Analytics, and Prometheus
- Automate infrastructure provisioning and management using Infrastructure-as-Code (IaC) tools like Terraform and scripting languages (Bash, PowerShell, Python)
- Optimize deployment pipelines and ensure secure, scalable workflows in Azure DevOps
- Collaborate with cross-functional teams to drive service improvements and share best practices
- Proactively set up alerts and monitoring to prevent SLA degradation and ensure high availability
- Conduct post-incident reviews and implement long-term reliability solutions
- Support performance tuning and resource optimization for cloud workloads
- Communicate effectively with both technical and non-technical stakeholders
Requirements
- 3+ years of experience in DevOps or Site Reliability Engineering
- Proven expertise with Azure services, including AKS (Kubernetes), Azure Monitor, Application Insights, Log Analytics, Cosmos DB, PostgreSQL, and Azure DevOps
- Strong hands-on experience with observability and monitoring tools (Azure Monitor, Log Analytics, Application Insights, Prometheus, Grafana)
- Proficiency in Infrastructure-as-Code (Terraform) and scripting (Bash, PowerShell, Python)
- Demonstrated incident management skills, including root-cause analysis and postmortem processes
- Experience automating deployment pipelines and routine operational tasks
- Excellent problem-solving and debugging skills in complex, real-time environments
- Strong verbal and written communication skills for cross-team collaboration
- Ability to prioritize and manage multiple tasks in a fast-paced, Agile environment
- Minimum English language proficiency at the B2+ level
Nice to have
- Experience with AWS services (EKS, RDS, CloudWatch, X-Ray) and AWS monitoring tools
- Familiarity with distributed logging pipelines and resource optimization in AKS/EKS
- Knowledge of advanced Kubernetes use cases (service scaling, network configurations)
- Experience with incident automation tools and observability enhancements (Grafana, OpenSearch)
- Relevant certifications in Azure, AWS, or Kubernetes
We offer
- Engineering Heritage: Best-in-class experts sharing a culture of engineering excellence and tackling complex engineering challenges for over 30 years.
- Advanced Tech Stack: Innovative projects where you can apply or enhance your expertise in Cloud, Data, AI, and other emerging technologies
- World-Class Clients: Work closely with 295+ of the Forbes Global 2000 on creating disruptive solutions that make a global impact
- Professional Growth: Exceptional support for career development with comprehensive resources for upskilling or reskilling in pioneering practices
- GenAI Community: Strong AI competencies with 600+ experts across 55+ locations driving GenAI-enabled transformation journeys
- Entrepreneurial Culture: If you're passionate and dedicated to improving business transformation, we provide the support you need to bring your ideas to life
- Hybrid Setup: The flexibility to work from any location in Lithuania, whether it's your home or our dynamic offices in Vilnius and Kaunas
- Other Benefits: Additional vacation and trust days, private health insurance, Employee Stock Purchase Plan and more
About EPAM
EPAM is a leading global provider of digital platform engineering and development services. For over 30 years, our team has helped leading brands navigate the waves of digital transformation, building solutions that help them stay competitive through constant market disruption.
With offices in 55+ countries, EPAM has grown in Lithuania to over 1,200+ talented innovators in just 4 years. We foster creativity and unconventional ways of doing things, welcoming like-minded professionals to join us
Salary range €3.8K-€5.5K gross, based on your experience and interview results.
Join our team in our cozy offices in Vilnius or Kaunas.
Key Skills
Ranked by relevanceReady to apply?
Join EPAM Systems and take your career to the next level!
Application takes less than 5 minutes

