microagi
Machine Learning Engineer
microagiGermany1 day ago
Full-timeEngineering, Information Technology
Machine Learning Engineer — Automated Labeling & Data Engine

Location: Munich / Aachen

Team: Model & Data

Type: Full-time

Why this role

Great models come from great data engines. You’ll design and build an automated labeling stack that turns large unlabeled video/image corpora into high-quality training signals without human annotation. Your work will directly increase model accuracy, stability, and iteration speed.

What you’ll do
  • Build the automated labeling pipeline: design methods to extract training signals from raw data (e.g., self/weak supervision, pseudo-labeling, teacher-student distillation, synthetic augmentation).
  • Reliability & QA without humans: develop confidence calibration, uncertainty estimation, consensus/ensemble checks, and automatic error detectors; create metrics that predict downstream model lift.
  • Temporal & spatial consistency: enforce cross-frame consistency, track identities/structures over time, and design auto-repair strategies for drift and occlusions.
  • Active data selection: rank and curate raw data via informativeness/novelty/scarcity criteria; implement scalable sampling and replay policies.
  • Model-in-the-loop training: wire the pipeline so models continuously improve with fresh pseudo-labels; automate evaluation gates and rollback policies.
  • Tooling & infra: stand up robust ETL, versioned datasets, lineage tracking, and lightweight dashboards for health/coverage metrics.
  • Documentation: maintain specs for assumptions, failure modes, and decision rules; write crisp research-to-prod handoffs.
What you’ve done
  • 2+ years in ML with a focus on computer vision or representation learning; strong Python + PyTorch/TF/JAX.
  • Shipped at least one system using self-supervised, weakly supervised, or pseudo-label methods at scale.
  • Hands-on with uncertainty estimation / calibration (e.g., MC dropout, ensembles, temperature scaling) and automatic quality filters.
  • Experience designing data selection or active learning loops and measuring their impact on downstream metrics.
  • Solid software practice: reproducible training, data/version control, CI for ML (unit/integration tests for data & metrics).
Nice to have
  • Video understanding experience (temporal models, tracking, cross-frame consistency).
  • Experience with synthetic data, student-teacher distillation, or constraint-based labeling.
  • Data lineage/observability stacks (e.g., weights & biases, Neptune, MLflow + custom metadata).
  • Basic CUDA/ONNX/TensorRT for efficient inference at scale.
Working with us
  • Tight build-measure-learn cycles, high ownership, pragmatic research mindset.
  • Learn from the team that has been world class at execution and speed.
  • Freedom to choose methods and tooling that maximize signal quality and iteration speed.
  • Competitive compensation and meaningful equity.

Key Skills

Ranked by relevance