ABOUT DATAMENTORS
Datamentors is a robotics start-up building an agnostic orchestration platform and a family of proprietary robots — quadruped, mobile upper-body and humanoid — assembled at our own factory in Caniçal, Madeira. We also develop custom Vision-Language-Action (VLA) models that give our robots natural-language reasoning and autonomy, deployable across sectors from hospitality and healthcare to defence and logistics. With prototypes running and a clear path to 600 units per year, the engineers joining now will shape both the hardware and the intelligence behind Ardia for years to come.
ROLE OVERVIEW
Own the post-training stage of the Ardia humanoid platform — taking pretrained, behaviour-cloned Vision-Language-Action policies and systematically lifting their success rate, precision, and robustness through RL fine-tuning, reward modelling, and a tight sim-to-real loop. This hands-on leadership role sets the technical direction for RL across the platform, builds the training and evaluation infrastructure, and grows a small team around it. You’ll be the person who turns a capable-but-inconsistent humanoid into a dependable one — the difference between a research demo and a product.
WHAT YOU'LL DO
- Post-VLA RL fine-tuning — design and run the pipeline that takes a pretrained/imitation-learned VLA policy and improves it with reinforcement learning (online and offline RL, RLHF/RL-from-feedback, preference and reward-model approaches) to lift task success rates and reduce failure modes.
- Reward design & evaluation — define reward signals, success criteria, and automated evaluation harnesses that actually correlate with real-world manipulation and whole-body performance, and that catch regressions before deployment.
- Sim-to-real — build and tune the simulation-to-hardware transfer loop (domain randomization, residual policies, real-world fine-tuning) so gains in simulation hold up on the physical robot.
- Data flywheel — establish the loop that turns robot rollouts and teleop data into better policies: autonomous data collection, filtering, labelling, and continual retraining.
- Integration across the stack — work with the teams owning the orchestration layer, the VLA policy, and whole-body control so RL improvements compose cleanly rather than fighting the rest of the system.
- Technical leadership — set the RL roadmap, make build-vs-adopt calls on frameworks and tooling, mentor engineers, and represent the work to the wider team and external partners.
WHAT WE'RE LOOKING FOR
- Strong, demonstrable RL expertise: PPO/GRPO and related policy-gradient methods, offline RL, RLHF/preference-based RL, and reward modelling — you understand where each breaks and why.
- Robot learning / embodied AI experience — ideally hands-on with VLA models (e.g. OpenVLA, π0, GR00T-class, or comparable manipulation/locomotion policies).
- Practical sim-to-real experience and fluency with at least one major simulator (Isaac Lab / Isaac Gym, MuJoCo, or similar).
- Solid ML engineering: PyTorch, distributed training, GPU-efficient pipelines, and the discipline to build reproducible experiments and rigorous evaluation.
- Evidence of shipping — you’ve taken a policy from “works in a demo” to “works reliably,” not just published benchmarks.
Nice to have
- Direct experience with humanoid or whole-body control (locomotion + manipulation), and with the realities of training on real hardware.
- Familiarity with VLA post-training specifically — fine-tuning, distillation, or RL on top of large pretrained action models.
- Comfort working close to perception (vision, point clouds) and to low-level control.
- Open-source contributions to robot learning or RL frameworks.
- Experience standing up data-collection / teleoperation pipelines.
- Publications or applied work at the VLA / robot-learning frontier (ICRA / CoRL / RSS-level).
WHY DATAMENTORS
- Own the post-training layer end to end, with the autonomy to build it your way.
- The architecture, hardware, and base policies are already in place — your RL work is the leap that turns a capable demo into a dependable product.
- Hands-on technical leadership: set the RL roadmap, make the tooling calls, and grow a small team around the work.
- We assess candidates on demonstrated ability, not credentials alone — if you’ve done the work and can show it, we want to talk.
From research demo to dependable product — own it.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
AI Research Engineer
2026-05-23
Software Engineer, Robot Learning & Interfaces
2026-05-23
Artificial Intelligence Engineer
2026-06-17
- Posted
- Jun 16, 2026
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Madeira Island
- Company
- Datamentors
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
AI Research Engineer
2026-05-23
Software Engineer, Robot Learning & Interfaces
2026-05-23
Artificial Intelligence Engineer
2026-06-17