Site Reliability Engineer

Next Ventures

Netherlands · Contract · Mid-Senior

Site Reliability Engineer (SRE) – Sovereign Cloud (CloudOps)

Location: Amsterdam (Hybrid)

Start Date: September 2026

Contract: Long term contract

Language Requirement: Fluent Dutch (mandatory) + English

About the Role

We are looking for an experienced Site Reliability Engineer (SRE) to ensure the reliability, performance, and operational excellence of customer environments on the Internal Sovereign Cloud platform.

In this role, you will be responsible for defining reliability standards, building observability solutions, managing incidents, and driving continuous improvement through automation. You will play a key role in enabling stable, scalable, and highly available cloud environments within a 24/7 operational model.

Key Responsibilities

Define, implement, and maintain SLIs and SLOs for customer environments
Design and operate observability solutions (metrics, logs, traces, dashboards) using Prometheus, Grafana, ELK, OpenTelemetry
Configure intelligent alerting to reduce noise and prevent alert fatigue
Own incident management processes, including P1/P2 escalations, root cause analysis, and post-incident reviews
Correlate metrics, logs, and platform events to determine root causes in complex systems
Create and maintain runbooks and escalation procedures
Automate operational workflows, remediation actions, and self-healing mechanisms
Drive continuous improvement based on SLO performance, error budgets, and incident trends
Enable and support 24/7 operations through guidance and knowledge sharing
Support customer-facing incident reporting and reliability reviews
Collaborate with Platform Ops to integrate platform telemetry into customer dashboards
Advise stakeholders on reliability, performance, and availability improvements

Required Skills & Experience

5–8 years of experience in SRE, platform operations, or reliability engineering roles
Strong hands-on experience with SRE principles (SLI/SLO/SLA, error budgets, toil reduction)
Expertise in observability tools such as Prometheus, Grafana, ELK Stack, OpenTelemetry, Loki
Strong incident management and root cause analysis skills in distributed environments
Experience with Kubernetes / OpenShift operations and troubleshooting
Experience automating workflows using Infrastructure-as-Code and scripting (Python, Go, Bash)
Solid understanding of performance, capacity, availability, and resilience engineering
Strong decision-making skills under pressure with a structured, disciplined approach
Fluent Dutch is mandatory

Preferred Certifications

SRE Foundation / Practitioner (DevOps Institute)
Certified Kubernetes Administrator (CKA)
ITIL 4 Foundation
Red Hat Certified Specialist in OpenShift Administration

Why Join?

Work on mission-critical sovereign cloud platforms
Take ownership of reliability and performance for high-impact customer environments
Be part of a collaborative, automation-driven CloudOps team
Hybrid working model in Amsterdam with long-term project stability

Interested? Apply now or reach out directly to learn more.

Key Skills

Ranked by relevance

cloud grafana elk kubernetes python devops

Related Jobs

3 roles aligned with this opportunity

View all jobs

DevOps Engineer

2026-06-19

Contract

Mid-Senior

Netherlands

IT Services

Information Technology