-
GeekSoft Consulting

Senior Site Reliability Engineer

GeekSoft Consulting
Netherlands · Full-time · Mid-Senior

  • Help design, build and continuously improve the clients online platform.
  • Research, suggest and implement new technology solutions following best practices/standards.
  • Take responsibility for the resiliency and availability of different products.
  • Be a productive member of the team.


Requirements
  • Design and implement strategies to ensure system uptime, fault tolerance, and performance optimization.
  • Define, track, and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
  • Create and maintain runbooks and automated recovery procedures to reduce manual effort and downtime.
  • Respond swiftly to system incidents and outages, serve as an escalation point during critical events.
  • Lead post-incident investigations, conduct blameless post-mortems, and implement corrective actions to prevent recurrence.
  • Participate in the on-call rotation and collaborate across teams for quick resolutions.
  • Use tools like Terraform to provision and manage infrastructure.
  • Ensure infrastructure is version-controlled, reproducible, auditable, and adheres to compliance requirements.
  • Implement and manage observability platforms (e.g., Splunk, Prometheus, Grafana).
  • Create dashboards and configure alerts to monitor system health and performance metrics.
  • Automate operational workflows, including deployments, scaling, backups, and failover mechanisms.
  • Develop internal tools to support development, release pipelines, and operational processes.
  • Partner with development teams to build scalable, supportable, and secure systems.
  • Champion CI/CD, test automation, and modern release practices.
  • Proficient in Python, Bash, Ruby, or similar scripting languages.
  • Skilled in debugging and building tools to streamline operations.
  • Hands-on experience with GCP and Azure, including cloud-native services, networking, and security best practices.
  • Deep knowledge of Linux/Unix and Windows environments, including performance tuning and system diagnostics.
  • Solid experience with Docker and Kubernetes (or equivalent orchestration platforms).
  • Familiar with Jenkins, GitHub Actions, ArgoCD, or similar tools for building and managing deployment pipelines.
  • Proficient with observability tools and practices for metrics, logging, and alerting.
  • Understanding of system security, including access control, secret management, and audit logging.
  • Strong communication and collaboration skills in cross-functional teams.
  • Ability to coach and mentor junior engineers.
  • Comfortable working under pressure, especially during critical incidents.
  • Analytical and proactive in identifying root causes and long-term solutions.


Benefits
  • A challenging, innovating environment.
  • Opportunities for learning where needed.

Key Skills

Ranked by relevance

fault tolerance kubernetes prometheus terraform jenkins python docker splunk cloud ruby bash cicd gcp
Login to Apply
Posted
Apr 14, 2025
Type
Full-time
Level
Mid-Senior
Location
Veldhoven

Industries

IT Services IT Consulting

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
GeekSoft Consulting
Related

Python AI Engineer

2026-05-20

Full-time
Mid-Senior
Netherlands
IT Services
Information Technology
View Job Details
GeekSoft Consulting
Related

Python Developer

2026-05-08

Full-time
Mid-Senior
Sweden
IT Services
Information Technology
View Job Details
GeekSoft Consulting
Related

Data Engineer

2026-05-19

Full-time
Mid-Senior
Netherlands
IT Services
Information Technology