microagi
Machine Learning Engineer — CV Annotation
microagiGermany7 hours ago
Full-timeEngineering, Information Technology

Why this role

Exceptional models are built on strong data foundations. In this position, you will design and develop an automated labeling system that transforms large collections of unlabeled video and image data into high-quality training signals without relying on human annotation. Your work will have a direct impact on model accuracy, robustness, and iteration speed.

What you will do

• Build an automated labeling pipeline by creating methods that extract training signals from raw data, such as self-supervision, weak supervision, pseudo-labeling, teacher-student distillation, and synthetic augmentation.

• Develop reliability and quality assurance techniques without human input, including confidence calibration, uncertainty estimation, ensemble and consensus checks, automatic error detection, and metrics that forecast downstream model improvements.

• Ensure temporal and spatial consistency by enforcing cross-frame alignment, tracking identities and structures over time, and designing automatic repair strategies for drift and occlusions.

• Create active data-selection strategies that rank and curate raw data based on informativeness, novelty, or scarcity, and implement scalable sampling and replay policies.

• Integrate models into the training loop so they improve continuously with updated pseudo-labels, while also automating evaluation gates and rollback procedures.

• Build supporting tools and infrastructure, including reliable ETL processes, versioned datasets, lineage tracking, and lightweight dashboards for monitoring data health and coverage.

• Produce clear documentation covering assumptions, failure modes, decision rules, and research-to-production handoff details.

What you have accomplished

• At least three years of experience in machine learning with a focus on computer vision or representation learning, along with strong skills in Python and PyTorch, TensorFlow, or JAX.

• Delivered at least one system that uses self-supervised, weakly supervised, or pseudo-labeling methods at scale.

• Practical experience with uncertainty estimation and calibration methods such as Monte Carlo dropout, ensembles, temperature scaling, and automated quality filters.

• Experience designing data-selection or active-learning loops and evaluating their effect on downstream metrics.

• Strong software engineering habits, including reproducible training workflows, data and version control, and CI practices for machine learning such as unit and integration tests for data and metrics.

Nice to have

• Experience with video understanding, including temporal models, tracking, and cross-frame consistency.

• Background in synthetic data, student-teacher distillation, or constraint-based labeling.

Key Skills

Ranked by relevance