Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Role: Platform DevOps-Engineer
Location: Munich, Germany
Job Type: Contract - 2 Days Onsite
Note: No Sponsorship Provided
Main Duties and Responsibilities:
You’ll shape our cloud-native architecture across AWS and Kubernetes, driving best-in-class infrastructure-as-code, CI/CD, and automation to accelerate model development and deployment. You’ll design and operate Kubeflow (or similar) pipelines and supporting services, streamline developer workflows, and raise the reliability and efficiency of the platform. Day to day, you’ll partner with product engineering, data scientists, and imaging teams to translate scientific and product needs into secure, production-grade infrastructure that scales.
- Design, deploy, and maintain Kubeflow (or equivalent) for pipeline orchestration, model training, evaluation, and serving on large image datasets; ensure reliability, security, and cost efficiency.
- Manage and tune Kubernetes clusters (EKS/GKE/AKS), set up namespaces, RBAC, autoscaling, network policies, and service meshes where appropriate; keep upgrades and operations predictable.
- Define infrastructure-as-code with Terraform; implement repeatable environment provisioning, configuration management, and golden paths for teams.
- Establish CI/CD workflows (GitHub Actions/Jenkins/GitLab CI), build/test standards, and progressive delivery patterns that keep releases fast and low-risk.
- Implement logging, metrics, and tracing (e.g., Prometheus, Grafana, CloudWatch, Splunk/New Relic) with actionable SLOs, alerts, and runbooks; embed security and compliance by design.
- Collaborate closely with product and science teams to remove bottlenecks, eliminate manual steps, and evolve service and data interfaces that make operating image pipelines simple and reliable.
- Contribute to future-state architectures that improve scalability, resiliency, and operational efficiency; lead targeted refactors and platform improvements.
- Manage core automation and tooling, and educate teams on platform capabilities, CI/CD, configuration management, and infrastructure automation best practices.
Required (Must-have):
- M.Sc. in Computer Science/Engineering (or equivalent) or comparable industry experience.
- Practical, production experience operating Kubeflow Pipelines for reproducible ML workflows at scale.
- Proven experience deploying and operating workloads on Kubernetes (EKS/GKE/AKS), including upgrades, autoscaling, RBAC, networking, and reliability; strong Unix/Linux fundamentals.
- Hands-on experience with AWS services (EKS, EC2, S3, IAM, CloudWatch; RDS a plus) and the ability to design secure, cost-aware architectures.
- Strong Terraform skills and Git-based workflows for repeatable infrastructure provisioning and configuration management.
- Practical experience with CI/CD platforms (GitHub Actions/Jenkins/GitLab CI), including artifact management, environment promotion, and progressive delivery.
- Solid Python and/or shell scripting for platform automation and toil reduction.
- Experience implementing logging, metrics, and tracing with SLOs, alerts, and runbooks (e.g., Prometheus, Grafana, CloudWatch, Splunk/New Relic) and a security-first mindset.
- Ability to lead technical initiatives, communicate trade-offs clearly, and collaborate effectively with engineering and science teams
Desirable (Nice to have):
- Experience with ML flow, Feast, Argo, Airflow, Ray, and model versioning/monitoring.
- Familiarity with S3/object storage, artifact registries, and handling large image datasets; basic SQL/NoSQL exposure.
- Experience with digital pathology or large-scale image processing (e.g., whole-slide images) and tools like Open Slide, scikit-image, or OpenCV.
- Experience tuning high-throughput pipelines, concurrency, memory usage, and integrating GPUs/accelerators.
- Experience with VPC design, ingress/egress, service meshes, secrets management, IAM, and policy as code.
- Experience in regulated environments (e.g., GxP), including data governance, privacy, and building software under regulated processes.
- Experience with Jira/Zendesk and with JavaScript/TypeScript for internal tools or dashboards.
Key Skills
Ranked by relevanceReady to apply?
Join PURVIEW and take your career to the next level!
Application takes less than 5 minutes

