● Design, deploy, and maintain Kubeflow (or equivalent) for pipeline orchestration, model training, evaluation, and serving on large image datasets; ensure reliability, security, and cost efficiency.
● Manage and tune Kubernetes clusters (EKS/GKE/AKS), set up namespaces, RBAC, autoscaling, network policies, and service meshes where appropriate; keep upgrades and operations predictable.
● Define infrastructure-as-code with Terraform; implement repeatable environment provisioning, configuration management, and golden paths for teams.
● Establish CI/CD workflows (GitHub Actions/Jenkins/GitLab CI), build/test standards, and progressive delivery patterns that keep releases fast and low-risk.
● Implement logging, metrics, and tracing (e.g., Prometheus, Grafana, CloudWatch, Splunk/New Relic) with actionable SLOs, alerts, and runbooks; embed security and compliance by design.
● Collaborate closely with product and science teams to remove bottlenecks, eliminate manual steps, and evolve service and data interfaces that make operating image pipelines simple and reliable.
● Contribute to future-state architectures that improve scalability, resiliency, and operational efficiency; lead targeted refactors and platform improvements.
● Manage core automation and tooling, and educate teams on platform capabilities, CI/CD, configuration management, and infrastructure automation best practices.
Required (Must-have):
● M.Sc. in Computer Science/Engineering (or equivalent) or comparable industry experience.
● Practical, production experience operating Kubeflow Pipelines for reproducible ML workflows at scale.
● Proven experience deploying and operating workloads on Kubernetes (EKS/GKE/AKS), including upgrades, autoscaling, RBAC, networking, and reliability; strong Unix/Linux fundamentals.
● Hands-on experience with AWS services (EKS, EC2, S3, IAM, CloudWatch; RDS a plus) and the ability to design secure, cost-aware architectures.
● Strong Terraform skills and Git-based workflows for repeatable infrastructure provisioning and configuration management.
● Practical experience with CI/CD platforms (GitHub Actions/Jenkins/GitLab CI), including artifact management, environment promotion, and progressive delivery. ● Solid Python and/or shell scripting for platform automation and toil reduction.
● Experience implementing logging, metrics, and tracing with SLOs, alerts, and runbooks (e.g., Prometheus, Grafana, CloudWatch, Splunk/New Relic) and a security-first mindset.
● Ability to lead technical initiatives, communicate trade-offs clearly, and collaborate effectively with engineering and science teams
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Backend Engineer
2026-05-26
DLT/Blockchain Architect
2026-05-24
Senior DevOps Engineer
2026-05-20
- Posted
- Apr 15, 2026
- Type
- Contract
- Level
- Mid-Senior
- Location
- Greater Munich Metropolitan Area
- Company
- KBC Technologies Group
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Backend Engineer
2026-05-26
DLT/Blockchain Architect
2026-05-24
Senior DevOps Engineer
2026-05-20