-
Keystone Recruitment

Senior Software Engineer – LLM Evaluation (Remote)

Keystone Recruitment
United Arab Emirates · Contract · Not Applicable

  • Title: Senior Software Engineer – LLM Evaluation (Remote)
  • Engagement: Hourly contract (independent contractor)
  • Location: Remote


About the Opportunity

One of our global AI research clients is building advanced evaluation and training datasets to improve large language models on realistic software engineering tasks. This project focuses on creating verifiable software engineering challenges derived from public repository histories using a structured, human-in-the-loop approach. The goal is to expand dataset coverage across programming languages, complexity levels, and real-world development scenarios.


Role Overview

We are seeking experienced, tech lead–level software engineers who are comfortable working with high-quality public GitHub repositories (500+ stars). This role combines hands-on engineering work with AI model evaluation, contributing directly to how AI systems interact with real-world codebases.


What You’ll Do

  • Analyze and triage GitHub issues across widely used open-source repositories
  • Set up and configure repositories, including Dockerization and development environment automation
  • Evaluate unit test coverage, quality, and reliability
  • Run, modify, and debug real-world codebases locally to assess AI model performance in bug-fixing and implementation tasks
  • Collaborate with AI researchers to identify challenging repositories and issue types for LLM evaluation
  • Contribute to designing structured, verifiable software engineering tasks
  • Potentially lead and mentor junior engineers on repository validation projects


Required Skills

  • 5+ years of professional software engineering experience
  • Strong expertise in at least one of the following: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby
  • Deep understanding of software architecture, debugging, and code quality standards
  • Proficiency with Git, Docker, and development pipeline setup
  • Ability to navigate and evaluate complex, production-grade codebases
  • Experience contributing to or reviewing open-source projects is a plus


Nice to Have

  • Experience participating in AI/LLM evaluation or research initiatives
  • Background in building developer tools, automation systems, or code verification agents
  • Experience leading small engineering teams


Engagement Details

  • Contractor assignment (no medical or paid leave)
  • 20 hours per week with partial PST overlap
  • Duration: 3 months
  • Expected start date: Next week
  • Fully remote


This role offers a unique opportunity to combine deep software engineering expertise with frontier AI research, directly influencing how large language models understand and solve real-world coding problems.


APPLY NOW !

Key Skills

Ranked by relevance

ai javascript python docker java rust git c
Login to Apply
Posted
Feb 17, 2026
Type
Contract
Level
Not Applicable
Location
United Arab Emirates

Industries

Technology Information Media

Categories

Education Research

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Hire Feed
Related

Backend Developer (Remote)

2026-06-19

Contract
Not Applicable
United Arab Emirates
Technology
Research
View Job Details
Hire Feed
Related

Junior Software Developer (Remote)

2026-06-18

Contract
Not Applicable
Argentina
Software Development
Engineering
View Job Details
Hire Feed
Related

Junior Software Developer (Remote)

2026-06-18

Contract
Not Applicable
Argentina
Software Development
Engineering