Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Job Description:
We are seeking highly analytical and detail-oriented professionals with hands-on experience in Red Teaming, Prompt Evaluation, and AI/LLM Quality Assurance. The ideal candidate will help us rigorously test and evaluate AI-generated content to identify vulnerabilities, assess risks, and ensure compliance with safety, ethical, and quality standards.
Key Responsibilities:
- Conduct Red Teaming exercises to identify adversarial, harmful, or unsafe outputs from large language models (LLMs).
- Evaluate and stress-test AI prompts across multiple domains (e.g., finance, healthcare, security) to uncover potential failure modes.
- Develop and apply test cases to assess accuracy, bias, toxicity, hallucinations, and misuse potential in AI-generated responses.
- Collaborate with data scientists, safety researchers, and prompt engineers to report risks and suggest mitigations.
- Perform manual QA and content validation across model versions, ensuring factual consistency, coherence, and guideline adherence.
- Create evaluation frameworks and scoring rubrics for prompt performance and safety compliance.
- Document findings, edge cases, and vulnerability reports with high clarity and structure.
Requirements:
- Proven experience in AI red teaming, LLM safety testing, or adversarial prompt design.
- Familiarity with prompt engineering, NLP tasks, and ethical considerations in generative AI.
- Strong background in Quality Assurance, content review, or test case development for AI/ML systems.
- Understanding of LLM behaviors, failure modes, and model evaluation metrics.
- Excellent critical thinking, pattern recognition, and analytical writing skills.
- Ability to work independently, follow detailed evaluation protocols, and meet tight deadlines.
Preferred Qualifications:
- Prior work with teams like OpenAI, Anthropic, Google DeepMind, or other LLM safety initiatives.
- Experience in risk assessment, red team security testing, or AI policy & governance.
Background in linguistics, psychology, or computational ethics is a plus.
Next Steps
To proceed further in the evaluation process, you will need to complete two assessments:
- Assessment Test
- Evaluates your linguistic and analytical skills
- Link: https://icap.innodata.com/registerfreelancer?enc=oUTZVsr/Pnz/0Xygc2EK32MdtinqnjC9vy8RU3Ha4EOAPwT2LJJQDD68MkY6jszYhhsYYecqmKWja8eKXV801gezikielezikiel
- Versant English Proficiency Test
- Focuses on assessing your spoken and written English proficiency
- A C1 or C2 level is required to qualify
Once both assessments are successfully completed, you will be eligible for onboarding.
- Language test
Action Required: XConnect Registration
You will also receive an invitation to our internal job platform, XConnect. Please take a few minutes to register and complete your profile. All project onboarding, communication, and documentation are managed through this platform.
If interested, kindly share your resume at: [email protected]
Key Skills
Ranked by relevanceReady to apply?
Join Innodata Inc. and take your career to the next level!
Application takes less than 5 minutes