Freelance Senior Site Reliability Engineer

Atos

Belgium · Contract · Mid-Senior

As part of the redesign of its execution model around the New platform, Our client has established a Site Reliability Coordination (SRC) team within LEV1 to enhance reliability, responsiveness, and incident coordination in a multi-provider context (LOT1, LOT2, LOT3).

The Senior SysOps Engineer plays a key role in the technical analysis of incidents, event correlation, and performance optimization using Site Reliability Engineering (SRE) and ITIL practices. They serve as a technical facilitator between support teams (LEV1, LEV2, LEV3), providers, and technical governance.

Main Responsibilities

Supervision and Correlation of Technical Incidents

Conduct in-depth analysis of logs, metrics, and alerts from various components (middleware, infrastructure, applications).
Ensure proactive monitoring of service performance and availability (centralized monitoring).
Facilitate root cause identification by collaborating with LEV2/LEV3 teams from providers (LOT1, LOT2, LOT3).
Correlate incidents across different layers of the system (e.g., application issue affecting infrastructure).
Alert and escalate to the appropriate teams when necessary.

Multi-Provider Technical Coordination

Participate in investigation meetings with technical experts from providers.
Ensure each party adheres to SLAs and contractual commitments.
Coordinate technical escalations and ensure clear tracking of actions taken.
Centralize and document technical exchanges in a structured way (runbooks, incident reports).

Continuous Improvement and Performance Optimization

Contribute to technical postmortems, analyzing causes and suggesting improvements.
Recommend monitoring and observability improvements to providers.
Track key performance indicators (SLI, SLO, SLA, MTTD, MTTR) and anticipate risks.
Conduct technological monitoring on SRE/DevOps tools and practices to enhance diagnostic capabilities.

Documentation and Knowledge Sharing

Maintain and enrich incident management and escalation runbooks.
Write technical guides for LEV1 to improve initial diagnosis.
Participate in training sessions to enhance the skills of the LEV1 teams.
Help develop the skills of the junior SysOps engineer on the team.

Participation in Committees and ITSM Governance

Attend operational follow-up committees (CAB, Incident Review, Performance Review) as a technical expert.
Share recommendations on critical incident management and change management.
Propose adjustments to ITIL and SRE processes to improve coordination effectiveness.

Technical Skills

Systems: Strong knowledge of Linux environments.
5 to 10 years of experience in a similar role (SysOps, SRE, Operations Engineer, Incident Manager, Observability Engineer).
Proven experience in Virtualization & Containers: Experience with IaaS technologies, Kubernetes, Docker, OpenShift.
Middleware & Messaging: Knowledge of solutions such as Kafka, JBoss, SpringBoot, HAProxy, etc.
Observability and Monitoring: Proficiency with tools like Prometheus, Grafana, Loki.
Databases-Experience with diagnostics on Oracle and PostgreSQL.
Automation & Scripting: Strong practice in Bash, Python, Ansible, Terraform to analyze and optimize operations.
SRE Methodology: Good understanding of SLI, SLO concepts, postmortems, and advanced monitoring.
ITIL v4: Good understanding of Incident, Problem, and Change processes.

Organizational and Interpersonal Skills

Analytical and synthesis skills to correlate technical incidents and anticipate risks.
Collaborative mindset to facilitate communication between technical teams and providers.
Strong written and oral communication skills, especially for documenting and simplifying incidents.
Autonomy and proactivity in incident management and continuous improvement.
Stress resistance, ability to handle critical incidents and prioritize effectively.

Experience and Education

ITIL v3/v4 certification is appreciated.
Kubernetes certification (CKA, CKAD), AWS/GCP/Azure, or Red Hat is a plus.
Experience in critical environments (high availability, high volume, SLA constraints).
Language: Mission in a bilingual French/Dutch environment. Fluency in one of the local language is required for this role

Key Skills

Ranked by relevance

itil sla high availability virtualization kubernetes prometheus terraform ansible grafana python docker oracle kafka linux bash

Related Jobs

3 roles aligned with this opportunity

View all jobs

AI Cybersecurity Engineer (Offensive Security & Threat Modeling)

2026-07-08

Contract

Mid-Senior

Belgium

IT Services

Information Technology

Site Reliability Engineer

2026-07-09

Full-time

Mid-Senior

Brazil

IT Services

Information Technology

Senior AI Engineer - Hedge Fund - Up to £300k TC

2026-07-16

Full-time

Mid-Senior

United Kingdom

IT Services

Information Technology

🇧🇪

Country Guide

Belgium

International hub for EU careers

Posted: Mar 05, 2025
Type: Contract
Level: Mid-Senior
Location: Brussels Metropolitan Area
Company: Atos

Industries

IT Services IT Consulting

Related Jobs

3 roles aligned with this opportunity

View all jobs

AI Cybersecurity Engineer (Offensive Security & Threat Modeling)

2026-07-08

Contract

Mid-Senior

Belgium

IT Services

Information Technology

Site Reliability Engineer

2026-07-09

Full-time

Mid-Senior

Brazil

IT Services

Information Technology

Senior AI Engineer - Hedge Fund - Up to £300k TC

2026-07-16

Full-time

Mid-Senior

United Kingdom

IT Services

Information Technology

Freelance Senior Site Reliability Engineer

Key Skills

Related Jobs

AI Cybersecurity Engineer (Offensive Security & Threat Modeling)

Site Reliability Engineer

Senior AI Engineer - Hedge Fund - Up to £300k TC

Related Jobs

AI Cybersecurity Engineer (Offensive Security & Threat Modeling)

Site Reliability Engineer

Senior AI Engineer - Hedge Fund - Up to £300k TC

Cookie Settings