Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey.
Position: SWE Expert
Type: Contract
Compensation: $70–$150/hour
Role Responsibilities
- Convert high-level objectives into tightly scoped, testable deliverables with clear inputs/outputs and measurable success criteria.
- Create structured documentation defining expected behavior, constraints, and edge cases for reuse by other evaluators.
- Build lightweight automation scripts to support evaluation flows, such as generating required artifacts and validating outputs.
- Write deterministic Python verifier scripts for completion checks via final state or output validation.
- Design prompts/tasks to reliably elicit target workflow behavior while avoiding leakage of internal instructions.
- Implement robust error handling and actionable failure messages in verification tooling.
- Develop plausible but ineffective “baseline” or “distractor” approaches to confirm evaluation discrimination.
- Maintain clean artifact hygiene with versionable structure, consistent naming, and reproducible execution.
Must-Have
- Strong Python skills in file system operations, parsing, validation, and deterministic execution.
- Experience with evaluation harnesses, automated grading, or QA-style verification.
- Familiarity with prompt design and LLM evaluation methodologies.
- Comfort with structured specs and documentation conventions like Markdown and YAML.
- Working knowledge of Git, CLI workflows, virtual environments, and dependency management.
- Knowledge of embeddings/similarity concepts like cosine similarity for negative-control design.
- Ability to communicate clearly and control scope without relying on domain-specific context.
- Upload resume
- AI interview based on your resume
- Submit form
- For details about the interview process and platform information, please check: https://talent.docs.mercor.com/welcome/welcome
- For any help or support, reach out to: [email protected]
,
Key Skills
Ranked by relevanceReady to apply?
Join Mercor and take your career to the next level!
Application takes less than 5 minutes

