-
View all jobs
- Help design, build and continuously improve the clients online platform.
- Research, suggest and implement new technology solutions following best practices/standards.
- Take responsibility for the resiliency and availability of different products.
- Be a productive member of the team.
- Design and implement strategies to ensure system uptime, fault tolerance, and performance optimization.
- Define, track, and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
- Create and maintain runbooks and automated recovery procedures to reduce manual effort and downtime.
- Respond swiftly to system incidents and outages, serve as an escalation point during critical events.
- Lead post-incident investigations, conduct blameless post-mortems, and implement corrective actions to prevent recurrence.
- Participate in the on-call rotation and collaborate across teams for quick resolutions.
- Use tools like Terraform to provision and manage infrastructure.
- Ensure infrastructure is version-controlled, reproducible, auditable, and adheres to compliance requirements.
- Implement and manage observability platforms (e.g., Splunk, Prometheus, Grafana).
- Create dashboards and configure alerts to monitor system health and performance metrics.
- Automate operational workflows, including deployments, scaling, backups, and failover mechanisms.
- Develop internal tools to support development, release pipelines, and operational processes.
- Partner with development teams to build scalable, supportable, and secure systems.
- Champion CI/CD, test automation, and modern release practices.
- Proficient in Python, Bash, Ruby, or similar scripting languages.
- Skilled in debugging and building tools to streamline operations.
- Hands-on experience with GCP and Azure, including cloud-native services, networking, and security best practices.
- Deep knowledge of Linux/Unix and Windows environments, including performance tuning and system diagnostics.
- Solid experience with Docker and Kubernetes (or equivalent orchestration platforms).
- Familiar with Jenkins, GitHub Actions, ArgoCD, or similar tools for building and managing deployment pipelines.
- Proficient with observability tools and practices for metrics, logging, and alerting.
- Understanding of system security, including access control, secret management, and audit logging.
- Strong communication and collaboration skills in cross-functional teams.
- Ability to coach and mentor junior engineers.
- Comfortable working under pressure, especially during critical incidents.
- Analytical and proactive in identifying root causes and long-term solutions.
- A challenging, innovating environment.
- Opportunities for learning where needed.
Key Skills
Ranked by relevance
fault tolerance
kubernetes
prometheus
terraform
jenkins
python
docker
splunk
cloud
ruby
bash
cicd
gcp
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Python AI Engineer
2026-05-20
Full-time
Mid-Senior
Netherlands
IT Services
Information Technology
View Job Details
Related
Python Developer
2026-05-08
Full-time
Mid-Senior
Sweden
IT Services
Information Technology
View Job Details
Related
Data Engineer
2026-05-19
Full-time
Mid-Senior
Netherlands
IT Services
Information Technology
Login to Apply
- Posted
- Apr 14, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Veldhoven
- Company
- GeekSoft Consulting
Industries
IT Services
IT Consulting
Categories
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Python AI Engineer
2026-05-20
Full-time
Mid-Senior
Netherlands
IT Services
Information Technology
View Job Details
Related
Python Developer
2026-05-08
Full-time
Mid-Senior
Sweden
IT Services
Information Technology
View Job Details
Related
Data Engineer
2026-05-19
Full-time
Mid-Senior
Netherlands
IT Services
Information Technology