Site Reliability Engineer (SRE) – Sovereign Cloud (CloudOps)
Location: Amsterdam (Hybrid)
Start Date: September 2026
Contract: Long term contract
Language Requirement: Fluent Dutch (mandatory) + English
About the Role
We are looking for an experienced Site Reliability Engineer (SRE) to ensure the reliability, performance, and operational excellence of customer environments on the Internal Sovereign Cloud platform.
In this role, you will be responsible for defining reliability standards, building observability solutions, managing incidents, and driving continuous improvement through automation. You will play a key role in enabling stable, scalable, and highly available cloud environments within a 24/7 operational model.
Key Responsibilities
- Define, implement, and maintain SLIs and SLOs for customer environments
- Design and operate observability solutions (metrics, logs, traces, dashboards) using Prometheus, Grafana, ELK, OpenTelemetry
- Configure intelligent alerting to reduce noise and prevent alert fatigue
- Own incident management processes, including P1/P2 escalations, root cause analysis, and post-incident reviews
- Correlate metrics, logs, and platform events to determine root causes in complex systems
- Create and maintain runbooks and escalation procedures
- Automate operational workflows, remediation actions, and self-healing mechanisms
- Drive continuous improvement based on SLO performance, error budgets, and incident trends
- Enable and support 24/7 operations through guidance and knowledge sharing
- Support customer-facing incident reporting and reliability reviews
- Collaborate with Platform Ops to integrate platform telemetry into customer dashboards
- Advise stakeholders on reliability, performance, and availability improvements
Required Skills & Experience
- 5–8 years of experience in SRE, platform operations, or reliability engineering roles
- Strong hands-on experience with SRE principles (SLI/SLO/SLA, error budgets, toil reduction)
- Expertise in observability tools such as Prometheus, Grafana, ELK Stack, OpenTelemetry, Loki
- Strong incident management and root cause analysis skills in distributed environments
- Experience with Kubernetes / OpenShift operations and troubleshooting
- Experience automating workflows using Infrastructure-as-Code and scripting (Python, Go, Bash)
- Solid understanding of performance, capacity, availability, and resilience engineering
- Strong decision-making skills under pressure with a structured, disciplined approach
- Fluent Dutch is mandatory
Preferred Certifications
- SRE Foundation / Practitioner (DevOps Institute)
- Certified Kubernetes Administrator (CKA)
- ITIL 4 Foundation
- Red Hat Certified Specialist in OpenShift Administration
Why Join?
- Work on mission-critical sovereign cloud platforms
- Take ownership of reliability and performance for high-impact customer environments
- Be part of a collaborative, automation-driven CloudOps team
- Hybrid working model in Amsterdam with long-term project stability
Interested? Apply now or reach out directly to learn more.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer
2026-06-19
Site Reliability Engineer
2026-06-19
AI Engineer
2026-06-18
- Posted
- Jun 19, 2026
- Type
- Contract
- Level
- Mid-Senior
- Location
- Amsterdam Area
- Company
- Next Ventures
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer
2026-06-19
Site Reliability Engineer
2026-06-19
AI Engineer
2026-06-18