Reinforcement Learning Lead

Datamentors

Portugal · Full-time · Mid-Senior

ABOUT DATAMENTORS

Datamentors is a robotics start-up building an agnostic orchestration platform and a family of proprietary robots — quadruped, mobile upper-body and humanoid — assembled at our own factory in Caniçal, Madeira. We also develop custom Vision-Language-Action (VLA) models that give our robots natural-language reasoning and autonomy, deployable across sectors from hospitality and healthcare to defence and logistics. With prototypes running and a clear path to 600 units per year, the engineers joining now will shape both the hardware and the intelligence behind Ardia for years to come.

ROLE OVERVIEW

Own the post-training stage of the Ardia humanoid platform — taking pretrained, behaviour-cloned Vision-Language-Action policies and systematically lifting their success rate, precision, and robustness through RL fine-tuning, reward modelling, and a tight sim-to-real loop. This hands-on leadership role sets the technical direction for RL across the platform, builds the training and evaluation infrastructure, and grows a small team around it. You’ll be the person who turns a capable-but-inconsistent humanoid into a dependable one — the difference between a research demo and a product.

WHAT YOU'LL DO

Post-VLA RL fine-tuning — design and run the pipeline that takes a pretrained/imitation-learned VLA policy and improves it with reinforcement learning (online and offline RL, RLHF/RL-from-feedback, preference and reward-model approaches) to lift task success rates and reduce failure modes.
Reward design & evaluation — define reward signals, success criteria, and automated evaluation harnesses that actually correlate with real-world manipulation and whole-body performance, and that catch regressions before deployment.
Sim-to-real — build and tune the simulation-to-hardware transfer loop (domain randomization, residual policies, real-world fine-tuning) so gains in simulation hold up on the physical robot.
Data flywheel — establish the loop that turns robot rollouts and teleop data into better policies: autonomous data collection, filtering, labelling, and continual retraining.
Integration across the stack — work with the teams owning the orchestration layer, the VLA policy, and whole-body control so RL improvements compose cleanly rather than fighting the rest of the system.
Technical leadership — set the RL roadmap, make build-vs-adopt calls on frameworks and tooling, mentor engineers, and represent the work to the wider team and external partners.

WHAT WE'RE LOOKING FOR

Strong, demonstrable RL expertise: PPO/GRPO and related policy-gradient methods, offline RL, RLHF/preference-based RL, and reward modelling — you understand where each breaks and why.
Robot learning / embodied AI experience — ideally hands-on with VLA models (e.g. OpenVLA, π0, GR00T-class, or comparable manipulation/locomotion policies).
Practical sim-to-real experience and fluency with at least one major simulator (Isaac Lab / Isaac Gym, MuJoCo, or similar).
Solid ML engineering: PyTorch, distributed training, GPU-efficient pipelines, and the discipline to build reproducible experiments and rigorous evaluation.
Evidence of shipping — you’ve taken a policy from “works in a demo” to “works reliably,” not just published benchmarks.

Nice to have

Direct experience with humanoid or whole-body control (locomotion + manipulation), and with the realities of training on real hardware.
Familiarity with VLA post-training specifically — fine-tuning, distillation, or RL on top of large pretrained action models.
Comfort working close to perception (vision, point clouds) and to low-level control.
Open-source contributions to robot learning or RL frameworks.
Experience standing up data-collection / teleoperation pipelines.
Publications or applied work at the VLA / robot-learning frontier (ICRA / CoRL / RSS-level).

WHY DATAMENTORS

Own the post-training layer end to end, with the autonomy to build it your way.
The architecture, hardware, and base policies are already in place — your RL work is the leap that turns a capable demo into a dependable product.
Hands-on technical leadership: set the RL roadmap, make the tooling calls, and grow a small team around the work.
We assess candidates on demonstrated ability, not credentials alone — if you’ve done the work and can show it, we want to talk.

From research demo to dependable product — own it.

Key Skills

Ranked by relevance

simulation prototypes pytorch ai

Related Jobs

3 roles aligned with this opportunity

View all jobs

AI Research Engineer

2026-05-23

Full-time

Entry

Singapore

Robotics Engineering

Engineering

Software Engineer, Robot Learning & Interfaces

2026-05-23

Full-time

Not Applicable

Switzerland

Robotics Engineering

Engineering

Artificial Intelligence Engineer

2026-06-17

Full-time

Mid-Senior

Romania

Artificial Intelligence

Engineering

🇵🇹

Country Guide

Portugal

Startup-friendly with strong expat appeal

Posted: Jun 16, 2026
Type: Full-time
Level: Mid-Senior
Location: Madeira Island
Company: Datamentors

Industries

Robotics Engineering Artificial Intelligence

Related Jobs

3 roles aligned with this opportunity

View all jobs

AI Research Engineer

2026-05-23

Full-time

Entry

Singapore

Robotics Engineering

Engineering

Software Engineer, Robot Learning & Interfaces

2026-05-23

Full-time

Not Applicable

Switzerland

Robotics Engineering

Engineering

Artificial Intelligence Engineer

2026-06-17

Full-time

Mid-Senior

Romania

Artificial Intelligence

Engineering

Reinforcement Learning Lead

Key Skills

Related Jobs

AI Research Engineer

Software Engineer, Robot Learning & Interfaces

Artificial Intelligence Engineer

Related Jobs

AI Research Engineer

Software Engineer, Robot Learning & Interfaces

Artificial Intelligence Engineer

Cookie Settings