Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey.
Position: AI Task Evaluation & Statistical Analysis Specialist
Type: Contract
Compensation: $100–$120/hour
Location: Remote
Role Responsibilities
- Conduct comprehensive statistical failure analysis to identify patterns in AI agent failures across task components such as prompts, rubrics, and templates.
- Perform root cause analysis to determine if failures are due to task design, rubric clarity, file complexity, or agent limitations.
- Analyze performance variations across finance sub-domains, file types, and task categories to enhance understanding of AI model performance.
- Create dashboards and reports to highlight failure clusters, edge cases, and improvement opportunities.
- Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings.
- Present insights to data labeling experts and technical teams to foster collaboration and drive improvements.
Must-Have
- Statistical Expertise: Strong foundation in statistical analysis, hypothesis testing, and pattern recognition.
- Programming: Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis.
- Data Analysis: Experience with exploratory data analysis and creating actionable insights from complex datasets.
- AI/ML Familiarity: Understanding of LLM evaluation methods and quality metrics.
- Tools: Comfortable working with Excel, data visualization tools (Tableau/Looker), and SQL.
- Experience with AI/ML model evaluation or quality assurance.
- Background in finance or willingness to learn finance domain concepts.
- Experience with multi-dimensional failure analysis.
- Familiarity with benchmark datasets and evaluation frameworks.
- 2-4 years of relevant experience.
- Upload resume
- AI interview based on your resume
- Submit form
- For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
- For any help or support, reach out to: [email protected]
,
Key Skills
Ranked by relevanceReady to apply?
Join Mercor and take your career to the next level!
Application takes less than 5 minutes

