-
Datamentors

Reinforcement Learning Lead

Datamentors
Portugal · Full-time · Mid-Senior

ABOUT DATAMENTORS

Datamentors is a robotics start-up building an agnostic orchestration platform and a family of proprietary robots — quadruped, mobile upper-body and humanoid — assembled at our own factory in Caniçal, Madeira. We also develop custom Vision-Language-Action (VLA) models that give our robots natural-language reasoning and autonomy, deployable across sectors from hospitality and healthcare to defence and logistics. With prototypes running and a clear path to 600 units per year, the engineers joining now will shape both the hardware and the intelligence behind Ardia for years to come.


ROLE OVERVIEW

Own the post-training stage of the Ardia humanoid platform — taking pretrained, behaviour-cloned Vision-Language-Action policies and systematically lifting their success rate, precision, and robustness through RL fine-tuning, reward modelling, and a tight sim-to-real loop. This hands-on leadership role sets the technical direction for RL across the platform, builds the training and evaluation infrastructure, and grows a small team around it. You’ll be the person who turns a capable-but-inconsistent humanoid into a dependable one — the difference between a research demo and a product.


WHAT YOU'LL DO

  • Post-VLA RL fine-tuning — design and run the pipeline that takes a pretrained/imitation-learned VLA policy and improves it with reinforcement learning (online and offline RL, RLHF/RL-from-feedback, preference and reward-model approaches) to lift task success rates and reduce failure modes.
  • Reward design & evaluation — define reward signals, success criteria, and automated evaluation harnesses that actually correlate with real-world manipulation and whole-body performance, and that catch regressions before deployment.
  • Sim-to-real — build and tune the simulation-to-hardware transfer loop (domain randomization, residual policies, real-world fine-tuning) so gains in simulation hold up on the physical robot.
  • Data flywheel — establish the loop that turns robot rollouts and teleop data into better policies: autonomous data collection, filtering, labelling, and continual retraining.
  • Integration across the stack — work with the teams owning the orchestration layer, the VLA policy, and whole-body control so RL improvements compose cleanly rather than fighting the rest of the system.
  • Technical leadership — set the RL roadmap, make build-vs-adopt calls on frameworks and tooling, mentor engineers, and represent the work to the wider team and external partners.


WHAT WE'RE LOOKING FOR

  • Strong, demonstrable RL expertise: PPO/GRPO and related policy-gradient methods, offline RL, RLHF/preference-based RL, and reward modelling — you understand where each breaks and why.
  • Robot learning / embodied AI experience — ideally hands-on with VLA models (e.g. OpenVLA, π0, GR00T-class, or comparable manipulation/locomotion policies).
  • Practical sim-to-real experience and fluency with at least one major simulator (Isaac Lab / Isaac Gym, MuJoCo, or similar).
  • Solid ML engineering: PyTorch, distributed training, GPU-efficient pipelines, and the discipline to build reproducible experiments and rigorous evaluation.
  • Evidence of shipping — you’ve taken a policy from “works in a demo” to “works reliably,” not just published benchmarks.


Nice to have

  • Direct experience with humanoid or whole-body control (locomotion + manipulation), and with the realities of training on real hardware.
  • Familiarity with VLA post-training specifically — fine-tuning, distillation, or RL on top of large pretrained action models.
  • Comfort working close to perception (vision, point clouds) and to low-level control.
  • Open-source contributions to robot learning or RL frameworks.
  • Experience standing up data-collection / teleoperation pipelines.
  • Publications or applied work at the VLA / robot-learning frontier (ICRA / CoRL / RSS-level).


WHY DATAMENTORS

  • Own the post-training layer end to end, with the autonomy to build it your way.
  • The architecture, hardware, and base policies are already in place — your RL work is the leap that turns a capable demo into a dependable product.
  • Hands-on technical leadership: set the RL roadmap, make the tooling calls, and grow a small team around the work.
  • We assess candidates on demonstrated ability, not credentials alone — if you’ve done the work and can show it, we want to talk.



From research demo to dependable product — own it.

Key Skills

Ranked by relevance

simulation prototypes pytorch ai
Login to Apply
Posted
Jun 16, 2026
Type
Full-time
Level
Mid-Senior
Location
Madeira Island

Industries

Robotics Engineering Artificial Intelligence

Categories

Engineering

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Griffin Labs
Related

AI Research Engineer

2026-05-23

Full-time
Entry
Singapore
Robotics Engineering
Engineering
View Job Details
Embodied AI
Related

Software Engineer, Robot Learning & Interfaces

2026-05-23

Full-time
Not Applicable
Switzerland
Robotics Engineering
Engineering
View Job Details
OptiComm.ai
Related

Artificial Intelligence Engineer

2026-06-17

Full-time
Mid-Senior
Romania
Artificial Intelligence
Engineering