Machine Learning Engineer

microagi

Germany · Full-time · Entry

Machine Learning Engineer — Automated Labeling & Data Engine

Location: Munich / Aachen

Team: Model & Data

Type: Full-time

Why this role

Great models come from great data engines. You’ll design and build an automated labeling stack that turns large unlabeled video/image corpora into high-quality training signals without human annotation. Your work will directly increase model accuracy, stability, and iteration speed.

What you’ll do

Build the automated labeling pipeline: design methods to extract training signals from raw data (e.g., self/weak supervision, pseudo-labeling, teacher-student distillation, synthetic augmentation).
Reliability & QA without humans: develop confidence calibration, uncertainty estimation, consensus/ensemble checks, and automatic error detectors; create metrics that predict downstream model lift.
Temporal & spatial consistency: enforce cross-frame consistency, track identities/structures over time, and design auto-repair strategies for drift and occlusions.
Active data selection: rank and curate raw data via informativeness/novelty/scarcity criteria; implement scalable sampling and replay policies.
Model-in-the-loop training: wire the pipeline so models continuously improve with fresh pseudo-labels; automate evaluation gates and rollback policies.
Tooling & infra: stand up robust ETL, versioned datasets, lineage tracking, and lightweight dashboards for health/coverage metrics.
Documentation: maintain specs for assumptions, failure modes, and decision rules; write crisp research-to-prod handoffs.

What you’ve done

2+ years in ML with a focus on computer vision or representation learning; strong Python + PyTorch/TF/JAX.
Shipped at least one system using self-supervised, weakly supervised, or pseudo-label methods at scale.
Hands-on with uncertainty estimation / calibration (e.g., MC dropout, ensembles, temperature scaling) and automatic quality filters.
Experience designing data selection or active learning loops and measuring their impact on downstream metrics.
Solid software practice: reproducible training, data/version control, CI for ML (unit/integration tests for data & metrics).

Nice to have

Video understanding experience (temporal models, tracking, cross-frame consistency).
Experience with synthetic data, student-teacher distillation, or constraint-based labeling.
Data lineage/observability stacks (e.g., weights & biases, Neptune, MLflow + custom metadata).
Basic CUDA/ONNX/TensorRT for efficient inference at scale.

Working with us

Tight build-measure-learn cycles, high ownership, pragmatic research mindset.
Learn from the team that has been world class at execution and speed.
Freedom to choose methods and tooling that maximize signal quality and iteration speed.
Competitive compensation and meaningful equity.

Key Skills

Ranked by relevance

machine learning computer vision python mlflow etl

Related Jobs

3 roles aligned with this opportunity

View all jobs