ML Platform Engineer (all)

SoniaGermany7 days ago

Full-timeRemote FriendlyInformation Technology

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Let me introduce...

With Sonia, doctors are successful doctors. We create and deploy AI enhanced solutions that make doctors’ lives easier, patients’ care better, and healthcare systems more efficient. If you’re an intrinsically motivated self-starter who values impactful work, join us in revolutionizing healthcare.

We’re looking for an experienced ML Platform Engineer (all) with deep Kubernetes expertise to support the infrastructure powering our AI and ML workloads.

You’ll work closely with ML engineers on everything from deploying cutting-edge LLM inference to refining observability and automating workflows—always with reliability, scalability, and performance as your guiding principles.

This role can be performed remotely from anywhere in Germany or Luxembourg, or in a hybrid setup from our offices in Luxembourg or Berlin.

This is what you’ll own

Support and enhance our Kubernetes-based infrastructure in cloud environments, running both ML/LLM workloads and general applications

Deploy and optimize LLM inference systems

Design, build, and improve MLOps/DevOps pipelines to support the entire development lifecycle

Manage GPU scheduling and autoscaling with Kubernetes-native tooling

Ensure observability and alerting across the platform

Operate and troubleshoot supporting infrastructure

Contribute to platform reliability, security, and performance through automation and best practices

You’ll thrive in this role if you bring

5+ years of experience in MLOps or SRE

Strong hands-on Kubernetes experience, including GitOps (Flux or ArgoCD), Kustomize, Helm and production troubleshooting

Familiarity with LLM inference deployment and optimization in Kubernetes (e.g., vLLM, LMCache, llm-d)

Experience with MLOps supporting tools such as MLflow or Argo Workflows

Understanding of GPU resource orchestration in Kubernetes environments

Profound knowledge of observability tools, such as VictoriaMetrics, VictoriaLogs and Grafana

Knowledge of database and broker administration (PostgreSQL, Redis and RabbitMQ)

Solid scripting skills in Python

Comfortable working with cloud platforms (OVHcloud, AWS, GCP or Azure)

Nice-to-Haves

Experience with audio ML models or real-time inference

Exposure to CI/CD practices tailored for ML systems

Familiarity with Kubernetes networking, security, or performance tuning

Why you’ll love working with us

Full ownership of a mission-critical platform

A team that values curiosity, learning, and experimentation

Remote-first setup with the option to work in our Berlin office

Competitive salary depending on experience

Work on AI infrastructure that directly impacts healthcare innovation

Ready to apply?

If you're passionate about web development and want to work with cutting-edge technologies, we'd love to hear from you!

I'm Margarita and will be guiding you through the application process.

Key Skills

Ranked by relevance

Ready to apply?

Join Sonia and take your career to the next level!

Application takes less than 5 minutes

Apply