AI Evaluation Engineer

RYZ Labs is looking for an experienced AI Evaluation Engineer to join one of our clients’ teams.

Responsibilities

Design and implement evaluation pipelines to measure the performance and reliability of AI models.
Develop automated testing frameworks to assess model outputs at scale.
Analyze model performance using both traditional statistical metrics and AI-specific evaluation methods.
Evaluate AI systems built on modern architectures such as LLM-based applications and Retrieval-Augmented Generation (RAG).
Identify potential issues related to accuracy, hallucinations, bias, safety, and model drift.
Conduct adversarial testing to uncover vulnerabilities and ensure safe model behavior.
Collaborate with engineering and AI teams to improve prompt design, model outputs, and system performance.
Monitor model performance in production and help define best practices for AI evaluation and observability.

Requirements

Proficiency in Python and experience building scripts or pipelines to evaluate model outputs.
Experience working with AI/ML systems, particularly large language models (LLMs) or generative AI applications.
Familiarity with concepts such as prompt engineering, prompt optimization, and LLM evaluation.
Understanding of evaluation metrics such as precision, recall, F1-score, and AI-specific metrics related to model quality and safety.
Experience evaluating RAG systems or knowledge retrieval pipelines is a plus.
Experience with modern AI evaluation or observability tools is a plus (e.g., DeepEval, Promptfoo, RAGAS, LangSmith, Arize, Weights & Biases).
Strong analytical mindset with the ability to interpret model behavior and propose improvements.

Nice to Have

Experience performing adversarial testing or red-teaming of AI systems.
Familiarity with AI safety, bias detection, and model alignment practices.
Experience working in production environments deploying or monitoring AI systems.

About RYZ Labs:

RYZ Labs is a startup studio founded in 2021 by two lifelong entrepreneurs. The founders of RYZ have worked at some of the world's largest tech companies and some of the most iconic consumer brands. They have lived and worked in Argentina for many years and have decades of experience in Latam. What brought them together was their passion for the early phases of company creation and the idea of attracting the brightest talents in order to build industry-defining companies in a post-pandemic world.

Our teams are remote and distributed throughout the US and Latam. They use the latest cutting-edge cloud computing technologies to create scalable and resilient applications. We aim to provide diverse product solutions for different industries and plan to build a large number of startups in the upcoming years.

At RYZ, you will find yourself working with autonomy and efficiency, owning every step of your development. We provide an environment of opportunities, learning, growth, expansion, and challenging projects. You will deepen your experience while sharing and learning from a team of great professionals and specialists.

Our values and what to expect:

Customer First Mentality - Every decision we make should be made through the lens of the customer
Bias for Action - urgency is critical, expect that the timeline to get something done is accelerated
Ownership - Step up if you see an opportunity to help, even if it's not your core responsibility
Humility and Respect - Be willing to learn, be vulnerable, and treat everyone who interacts with RYZ with respect
Frugality - being frugal and cost-conscious helps us do more with less
Deliver Impact - get things done most efficiently
Raise our Standards - always be looking to improve our processes, our team, and our expectations. The status quo is not good enough and never should be

AI Evaluation Engineer

Key Skills

Related Jobs

AI Fullstack Engineer

Backend Engineer

Engineering Manager

Related Jobs

AI Fullstack Engineer

Backend Engineer

Engineering Manager

Cookie Settings