Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Position: Senior Software Engineer – LLM Evaluation
Location: Remote
Engagement Length: 1 Month
Rate: $80–$125/hr
About the Role
We are seeking an experienced Senior Software Engineer to support the development and evaluation of advanced datasets for training and benchmarking Large Language Models (LLMs). This role offers the opportunity to work closely with researchers and engineers to contribute to cutting-edge AI-driven solutions. You will be responsible for curating code examples, refining AI-generated code, identifying issues, and developing performance benchmarks.
Key Responsibilities:
- Curate and build software solutions in multiple programming languages.
- Evaluate and refine AI-generated code for efficiency and production readiness.
- Contribute to performance benchmarks and engineering workflow evaluations.
- Develop agents and verification mechanisms to assess code quality.
- Collaborate with cross-functional engineering teams to support code design, development, debugging, and architecture improvements.
Required Candidate Background:
- Minimum 5 years of software engineering experience.
- At least 2+ years of continuous full-time experience at top-tier product companies (e.g., Google, Amazon, Meta, Netflix, Microsoft, Stripe, Dropbox, Shopify, Palantir, Datadog, etc.).
- Strong hands-on full-stack experience and production-grade deployment skills.
- Expertise in modern development languages and frameworks (Python, JavaScript (ReactJS), C/C++, Java, Rust, Go).
- Strong communication skills with the ability to provide structured evaluation feedback.
Additional Requirements:
- Minimum 10 hours/week commitment, with the possibility of up to 40 hours/week.
- Partial overlap with PST time zone preferred.
- Completion of mandatory vetting steps, including:
- ICF (Candidate Interest Form).
- Automated coding challenge.
- AI interview on Qode.
Preferred Companies:
Google, Apple, Amazon, Meta, Netflix, Microsoft, Tesla, NVIDIA, Adobe, Salesforce, Github, Atlassian, Databricks, Palantir, Stripe, Uber, Lyft, Square (Block), and many more top-tier tech firms.
To Apply:
- Submit your resume.
- Complete the ICF and automated coding challenge.
- Proceed to the AI interview on Qode.
Key Skills
Ranked by relevanceReady to apply?
Join Crossing Hurdles and take your career to the next level!
Application takes less than 5 minutes

