Braintrust
AI Evaluator / Annotator (Remote- freelance, 100+ openings)
BraintrustArgentina1 day ago
Full-timeRemote FriendlyEducation, Training
Job Description

Position Overview:

iMerit seeks detail-oriented and analytically minded Multimodal GenAI Evaluation Analysts to

perform highly nuanced evaluations of AI system outputs across different modalities: text,

image, video, and multimodal interactions. Analysts will assess the accuracy, appropriateness,

quality, clarity, and cultural alignment of model outputs against complex guidelines, ensuring that

results align with project standards and real-world use cases. These evaluations will directly

inform the development and fine-tuning of advanced large language models (LLMs), vision

models (LVMs), and multimodal AI systems.

Role Responsibilities:

  • Evaluate outputs generated by LLMs across multiple modalities (text, image captions,

video descriptions, and multimodal prompts).

  • Assess quality against project-specific criteria such as correctness, coherence,

completeness, style, cultural appropriateness, and safety.

  • Identify subtle errors, hallucinations, or biases in AI responses.
  • Apply domain expertise and logical reasoning to resolve ambiguous or unclear outputs.
  • Provide detailed written feedback, tagging, and scoring of outputs to ensure consistency

across the evaluation team.

  • Escalate unclear cases and contribute to refining evaluation guidelines.
  • Collaborate with Project Managers and Quality Leads to meet accuracy, reliability, and

turnaround benchmarks.

Skills & Competencies:

  • Strong critical reading, observational, and evaluative skills across different modalities.
  • Ability to articulate nuanced judgments with precision and clarity.
  • Excellent English comprehension (CEFR B2 or above); additional languages a plus.
  • Familiarity with LLMs, generative AI, and multimodal systems.
  • Strong attention to detail and ability to apply guidelines consistently.
  • Awareness of cultural and linguistic nuances, including potential bias and harm in AI

outputs.

  • Comfort with evolving workflows, rapid feedback cycles, and complex quality

frameworks.

Requirements:

  • Bachelor's degree/ diploma or equivalent educational qualification.
  • 1+ years of experience in data annotation, LLM evaluation, content moderation, or

related AI/ML domains.

  • Demonstrated experience working with data annotation tools and software platforms.
  • Strong understanding of language and multimodal communication (instruction following

in image generation, fact-checking, narrative coherence in video, etc.).

  • Ability to adapt quickly to changing project directions and fast-paced work environments.
  • Previous experience creating or annotating complex data specifically for Large

Language Model (LLM) training.

  • Prior exposure to generative AI, prompt engineering, or LLM fine-tuning workflows is a

plus.

While moderation of high-harm/high-risk material is not part of this role, candidates should be

aware that occasional exposure to NSFW or otherwise sensitive content may occur due to

imperfections in client-provided datasets. Applicants should indicate that they are comfortable

working in environments where such incidental exposure is a possibility.

What We Offer:

  • Opportunities to shape the evaluation standards for next-generation multimodal AI

systems.

  • Innovative and supportive global working environment.
  • Competitive compensation and flexible remote working arrangements.
  • Continuous learning and growth in applied AI evaluation.

Please acknowledge that you agree to the selection process below:

  • You will receive an iMerit platform assessment (15–30 minutes). If successfully completed, you’ll be invited to join the first project.
  • After onboarding, once you’ve completed 10 hours of work, a quality test will be conducted.
  • If you pass the quality test, you’ll continue on a 3-month project and will be invited to participate in upcoming projects.

Note:

  • You will complete a quick 15–30 minute assessment. This requires downloading a browser extension, which can be removed once the assessment is completed.
  • ID verification and background check are required.
  • Onboarding will be completed through iMerit’s platform.

For Digital Nomads: If you are currently traveling, please let us know. This ensures any discrepancies between your current location and your work authorization location do not affect your application.

Commitment:

  • Minimum 20 hours per week (flexible schedule).
  • You may work more hours if desired.

Hourly rates:

  • Malaysia – $5/hr
  • Mexico, Colombia, Brazil, Costa Rica – $8.50/hr
  • Argentina, Poland, Bulgaria, Romania, Malta, Latvia, Lithuania, UAE – $13/hr
  • Portugal, Italy, Greece, Spain – $15.50/hr
  • Canada, Australia, New Zealand, United Kingdom, Ireland, US, Finland, France, Sweden, Belgium, Austria, Denmark, Germany, Luxembourg, Estonia – $22/hr

Key Skills

Ranked by relevance