As part of the redesign of its execution model around the New platform, Our client has established a Site Reliability Coordination (SRC) team within LEV1 to enhance reliability, responsiveness, and incident coordination in a multi-provider context (LOT1, LOT2, LOT3).
The Senior SysOps Engineer plays a key role in the technical analysis of incidents, event correlation, and performance optimization using Site Reliability Engineering (SRE) and ITIL practices. They serve as a technical facilitator between support teams (LEV1, LEV2, LEV3), providers, and technical governance.
Main Responsibilities
Supervision and Correlation of Technical Incidents
- Conduct in-depth analysis of logs, metrics, and alerts from various components (middleware, infrastructure, applications).
- Ensure proactive monitoring of service performance and availability (centralized monitoring).
- Facilitate root cause identification by collaborating with LEV2/LEV3 teams from providers (LOT1, LOT2, LOT3).
- Correlate incidents across different layers of the system (e.g., application issue affecting infrastructure).
- Alert and escalate to the appropriate teams when necessary.
Multi-Provider Technical Coordination
- Participate in investigation meetings with technical experts from providers.
- Ensure each party adheres to SLAs and contractual commitments.
- Coordinate technical escalations and ensure clear tracking of actions taken.
- Centralize and document technical exchanges in a structured way (runbooks, incident reports).
Continuous Improvement and Performance Optimization
- Contribute to technical postmortems, analyzing causes and suggesting improvements.
- Recommend monitoring and observability improvements to providers.
- Track key performance indicators (SLI, SLO, SLA, MTTD, MTTR) and anticipate risks.
- Conduct technological monitoring on SRE/DevOps tools and practices to enhance diagnostic capabilities.
Documentation and Knowledge Sharing
- Maintain and enrich incident management and escalation runbooks.
- Write technical guides for LEV1 to improve initial diagnosis.
- Participate in training sessions to enhance the skills of the LEV1 teams.
- Help develop the skills of the junior SysOps engineer on the team.
Participation in Committees and ITSM Governance
- Attend operational follow-up committees (CAB, Incident Review, Performance Review) as a technical expert.
- Share recommendations on critical incident management and change management.
- Propose adjustments to ITIL and SRE processes to improve coordination effectiveness.
Technical Skills
- Systems: Strong knowledge of Linux environments.
- 5 to 10 years of experience in a similar role (SysOps, SRE, Operations Engineer, Incident Manager, Observability Engineer).
- Proven experience in Virtualization & Containers: Experience with IaaS technologies, Kubernetes, Docker, OpenShift.
- Middleware & Messaging: Knowledge of solutions such as Kafka, JBoss, SpringBoot, HAProxy, etc.
- Observability and Monitoring: Proficiency with tools like Prometheus, Grafana, Loki.
- Databases-Experience with diagnostics on Oracle and PostgreSQL.
- Automation & Scripting: Strong practice in Bash, Python, Ansible, Terraform to analyze and optimize operations.
- SRE Methodology: Good understanding of SLI, SLO concepts, postmortems, and advanced monitoring.
- ITIL v4: Good understanding of Incident, Problem, and Change processes.
Organizational and Interpersonal Skills
- Analytical and synthesis skills to correlate technical incidents and anticipate risks.
- Collaborative mindset to facilitate communication between technical teams and providers.
- Strong written and oral communication skills, especially for documenting and simplifying incidents.
- Autonomy and proactivity in incident management and continuous improvement.
- Stress resistance, ability to handle critical incidents and prioritize effectively.
Experience and Education
- ITIL v3/v4 certification is appreciated.
- Kubernetes certification (CKA, CKAD), AWS/GCP/Azure, or Red Hat is a plus.
- Experience in critical environments (high availability, high volume, SLA constraints).
- Language: Mission in a bilingual French/Dutch environment. Fluency in one of the local language is required for this role
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Senior Kubernetes / Container Platform Engineer
2026-05-24
DevOps Engineer
2026-05-27
Network and Systems Engineer
2026-05-28
- Posted
- Mar 05, 2025
- Type
- Contract
- Level
- Mid-Senior
- Location
- Brussels Metropolitan Area
- Company
- Atos
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Senior Kubernetes / Container Platform Engineer
2026-05-24
DevOps Engineer
2026-05-27
Network and Systems Engineer
2026-05-28