Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Hybrid: 3 days in the office/week
As a Lead Site Reliability Engineer, you’ll be at the forefront of building scalable, resilient, and observable systems that power Tricentis SaaS products globally. This is a hands-on engineering leadership role—balancing technical delivery, process ownership, and team mentorship.
You will drive initiatives across multiple products, shape SRE standards, and serve as a trusted partner to both engineering and product leaders. You will be responsible for elevating engineering quality and reliability while enabling scale and speed.
Your Impact as an 🚀
- Lead and deliver cross-cutting initiatives to improve platform scalability, resilience, and cost efficiency.
- Architect and implement cloud-native infrastructure that supports multi-region, multi-tenant deployments.
- Improve observability strategy across systems and teams—including SLOs, error budgets, and alerting standards.
- Coach and mentor engineers, guiding technical design reviews and promoting engineering excellence.
- Own post-incident analysis and ensure learning loops are completed with preventive action.
- Influence product reliability from early-stage design to production readiness reviews.
- Establish and evolve standards for deployments, operational readiness, and incident response.
- Serve as a technical advisor for engineering and product managers across the org.
- Drive architectural discussions and make decisions that influence the SRE org and wider engineering teams.
- Define and evolve technical roadmaps and execution plans aligned with company goals.
- Partner with peers in security, infrastructure, and product to drive platform-wide improvements.
- Lead incident response for high-impact outages and continuously reduce incident recurrence.
- Contribute to SRE hiring through interviews, onboarding, and process refinement.
- Guide the adoption of modern tooling and practices across teams (e.g., GitOps, self-service platforms, chaos engineering).
- Represent SRE in leadership forums, bringing insights, trade-offs, and forward-looking strategies.
- 6+ years of experience in SRE, Infrastructure, or DevOps roles, including technical leadership.
- Expertise in building and operating production systems in public cloud (Azure).
- Deep understanding of observability principles (SLOs, SLIs, metrics, traces, logs).
- Strong experience with infrastructure-as-code, container orchestration, and CI/CD (Terraform, K8s, GitHub Actions).
- Proven track record in leading technical projects, influencing architecture, and mentoring engineers.
- Excellent communication and cross-functional collaboration skills.
- Proactive, ownership-driven mindset with a passion for reliability and continuous improvement.
AZURE , AWS, Terraform, GitHub Actions, Kubernetes, DataDog, Prometheus, Grafana, Betterstack, All-in-one incident management platform | incident.io , Jira and more
Our Culture 🦄
We don't just preach our values; we embody them in everything we do. We are committed to creating an environment that empowers, supports, and includes individuals, where trust, transparency, creativity, curiosity, and continuous improvement thrive on a daily basis.
Tricentis Core Values
Knowing what we need to achieve and how to achieve it is important. Tricentis' core values define our ways of working and the behaviors we model that create an enjoyable and successful Tricentis life.
- Demonstrate Self-Awareness: Own your strengths and limitations.
- Finish What We Start: Do what we say we are going to do.
- Move Fast: Create momentum and efficiency.
- Run Towards Change: Challenge the status quo.
- Serve Our Customers & Communities: Create a positive experience with each interaction.
- Solve Problems Together: We win or lose as one team.
- Think Big & Believe: Set extraordinary goals and believe you can achieve them.
Key Skills
Ranked by relevanceReady to apply?
Join Tricentis and take your career to the next level!
Application takes less than 5 minutes