AI Evaluator / Annotator (Remote- freelance, 100+ openings)

Braintrust

Argentina · Full-time · Mid-Senior

Job Description

Position Overview:

iMerit seeks detail-oriented and analytically minded Multimodal GenAI Evaluation Analysts to

perform highly nuanced evaluations of AI system outputs across different modalities: text,

image, video, and multimodal interactions. Analysts will assess the accuracy, appropriateness,

quality, clarity, and cultural alignment of model outputs against complex guidelines, ensuring that

results align with project standards and real-world use cases. These evaluations will directly

inform the development and fine-tuning of advanced large language models (LLMs), vision

models (LVMs), and multimodal AI systems.

Role Responsibilities:

Evaluate outputs generated by LLMs across multiple modalities (text, image captions,

video descriptions, and multimodal prompts).

Assess quality against project-specific criteria such as correctness, coherence,

completeness, style, cultural appropriateness, and safety.

Identify subtle errors, hallucinations, or biases in AI responses.
Apply domain expertise and logical reasoning to resolve ambiguous or unclear outputs.
Provide detailed written feedback, tagging, and scoring of outputs to ensure consistency

across the evaluation team.

Escalate unclear cases and contribute to refining evaluation guidelines.
Collaborate with Project Managers and Quality Leads to meet accuracy, reliability, and

turnaround benchmarks.

Skills & Competencies:

Strong critical reading, observational, and evaluative skills across different modalities.
Ability to articulate nuanced judgments with precision and clarity.
Excellent English comprehension (CEFR B2 or above); additional languages a plus.
Familiarity with LLMs, generative AI, and multimodal systems.
Strong attention to detail and ability to apply guidelines consistently.
Awareness of cultural and linguistic nuances, including potential bias and harm in AI

outputs.

Comfort with evolving workflows, rapid feedback cycles, and complex quality

frameworks.

Requirements:

Bachelor's degree/ diploma or equivalent educational qualification.
1+ years of experience in data annotation, LLM evaluation, content moderation, or

related AI/ML domains.

Demonstrated experience working with data annotation tools and software platforms.
Strong understanding of language and multimodal communication (instruction following

in image generation, fact-checking, narrative coherence in video, etc.).

Ability to adapt quickly to changing project directions and fast-paced work environments.
Previous experience creating or annotating complex data specifically for Large

Language Model (LLM) training.

Prior exposure to generative AI, prompt engineering, or LLM fine-tuning workflows is a

plus.

While moderation of high-harm/high-risk material is not part of this role, candidates should be

aware that occasional exposure to NSFW or otherwise sensitive content may occur due to

imperfections in client-provided datasets. Applicants should indicate that they are comfortable

working in environments where such incidental exposure is a possibility.

What We Offer:

Opportunities to shape the evaluation standards for next-generation multimodal AI

systems.

Innovative and supportive global working environment.
Competitive compensation and flexible remote working arrangements.
Continuous learning and growth in applied AI evaluation.

Please acknowledge that you agree to the selection process below:

You will receive an iMerit platform assessment (15–30 minutes). If successfully completed, you’ll be invited to join the first project.
After onboarding, once you’ve completed 10 hours of work, a quality test will be conducted.
If you pass the quality test, you’ll continue on a 3-month project and will be invited to participate in upcoming projects.

Note:

You will complete a quick 15–30 minute assessment. This requires downloading a browser extension, which can be removed once the assessment is completed.
ID verification and background check are required.
Onboarding will be completed through iMerit’s platform.

For Digital Nomads: If you are currently traveling, please let us know. This ensures any discrepancies between your current location and your work authorization location do not affect your application.

Commitment:

Minimum 20 hours per week (flexible schedule).
You may work more hours if desired.

Hourly rates:

Malaysia – $5/hr
Mexico, Colombia, Brazil, Costa Rica – $8.50/hr
Argentina, Poland, Bulgaria, Romania, Malta, Latvia, Lithuania, UAE – $13/hr
Portugal, Italy, Greece, Spain – $15.50/hr
Canada, Australia, New Zealand, United Kingdom, Ireland, US, Finland, France, Sweden, Belgium, Austria, Denmark, Germany, Luxembourg, Estonia – $22/hr

Key Skills

Ranked by relevance

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Software Engineer (Python) - Agent Evaluation - Freelance/Remote 100+ openings

2026-07-03

Full-time

Director

Slovenia

Technology

Engineering

Senior Technical Lead (Platform Modernization & AI Systems) - Remote LATAM

2026-05-08

Full-time

Director

Argentina

Technology

Engineering

Senior Technical Lead (Platform Modernization & AI Systems) - Remote LATAM

2026-04-21

Full-time

Director

Argentina

Technology

Engineering

Posted: Oct 29, 2025
Type: Full-time
Level: Mid-Senior
Location: Argentina
Company: Braintrust

Industries

Technology Information Internet

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Software Engineer (Python) - Agent Evaluation - Freelance/Remote 100+ openings

2026-07-03

Full-time

Director

Slovenia

Technology

Engineering

Senior Technical Lead (Platform Modernization & AI Systems) - Remote LATAM

2026-05-08

Full-time

Director

Argentina

Technology

Engineering

Senior Technical Lead (Platform Modernization & AI Systems) - Remote LATAM

2026-04-21

Full-time

Director

Argentina

Technology

Engineering

AI Evaluator / Annotator (Remote- freelance, 100+ openings)

Key Skills

Related Jobs

Senior Software Engineer (Python) - Agent Evaluation - Freelance/Remote 100+ openings

Senior Technical Lead (Platform Modernization & AI Systems) - Remote LATAM

Senior Technical Lead (Platform Modernization & AI Systems) - Remote LATAM

Related Jobs

Senior Software Engineer (Python) - Agent Evaluation - Freelance/Remote 100+ openings

Senior Technical Lead (Platform Modernization & AI Systems) - Remote LATAM

Senior Technical Lead (Platform Modernization & AI Systems) - Remote LATAM

Cookie Settings