Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Location: Munich / Aachen
Team: Model & Data
Type: Full-time
Why this roleGreat models come from great data engines. You’ll design and build an automated labeling stack that turns large unlabeled video/image corpora into high-quality training signals without human annotation. Your work will directly increase model accuracy, stability, and iteration speed.
What you’ll do- Build the automated labeling pipeline: design methods to extract training signals from raw data (e.g., self/weak supervision, pseudo-labeling, teacher-student distillation, synthetic augmentation).
- Reliability & QA without humans: develop confidence calibration, uncertainty estimation, consensus/ensemble checks, and automatic error detectors; create metrics that predict downstream model lift.
- Temporal & spatial consistency: enforce cross-frame consistency, track identities/structures over time, and design auto-repair strategies for drift and occlusions.
- Active data selection: rank and curate raw data via informativeness/novelty/scarcity criteria; implement scalable sampling and replay policies.
- Model-in-the-loop training: wire the pipeline so models continuously improve with fresh pseudo-labels; automate evaluation gates and rollback policies.
- Tooling & infra: stand up robust ETL, versioned datasets, lineage tracking, and lightweight dashboards for health/coverage metrics.
- Documentation: maintain specs for assumptions, failure modes, and decision rules; write crisp research-to-prod handoffs.
- 2+ years in ML with a focus on computer vision or representation learning; strong Python + PyTorch/TF/JAX.
- Shipped at least one system using self-supervised, weakly supervised, or pseudo-label methods at scale.
- Hands-on with uncertainty estimation / calibration (e.g., MC dropout, ensembles, temperature scaling) and automatic quality filters.
- Experience designing data selection or active learning loops and measuring their impact on downstream metrics.
- Solid software practice: reproducible training, data/version control, CI for ML (unit/integration tests for data & metrics).
- Video understanding experience (temporal models, tracking, cross-frame consistency).
- Experience with synthetic data, student-teacher distillation, or constraint-based labeling.
- Data lineage/observability stacks (e.g., weights & biases, Neptune, MLflow + custom metadata).
- Basic CUDA/ONNX/TensorRT for efficient inference at scale.
- Tight build-measure-learn cycles, high ownership, pragmatic research mindset.
- Learn from the team that has been world class at execution and speed.
- Freedom to choose methods and tooling that maximize signal quality and iteration speed.
- Competitive compensation and meaningful equity.
Key Skills
Ranked by relevanceReady to apply?
Join microagi and take your career to the next level!
Application takes less than 5 minutes

