-
View all jobs
hackajob is collaborating with Leo Technologies to connect them with exceptional tech professionals for this role.
Core Responsibilities
Core Responsibilities
- Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases.
- Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows.
- Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability).
- Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems.
- Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios.
- Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment.
- Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs.
- Provide technical leadership in responsible AI practices, influencing standards across the organization.
- Contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus).
- Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation.
- Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field.
- 3-5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety.
- Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming.
- Experience with bias detection, fairness approaches, and responsible AI design.
- Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith
- Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (LangGraph, Strands Agents, Pydantic AI, LangChain, HuggingFace, PyTorch, LlamaIndex).
- Experience integrating evaluations into DevOps/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or GitHub Actions.
- Understanding of cloud AI platforms (AWS, Azure) and deployment best practices.
- Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios.
- Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders.
Key Skills
Ranked by relevance
ai
kubernetes
artificial intelligence
terraform
pytorch
python
cloud
cicd
aws
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Staff Frontend Engineer
2026-05-28
Full-time
Not Applicable
Ireland
Software Development
Engineering
View Job Details
Related
Software Development Engineer - Kubernetes Service Mesh
2026-05-28
Full-time
Not Applicable
Ireland
Software Development
Engineering
View Job Details
Related
Cloud Software Engineer
2026-05-22
Full-time
Not Applicable
Lithuania
Software Development
Engineering
Login to Apply
- Posted
- Oct 16, 2025
- Type
- Full-time
- Level
- Entry
- Location
- United States
- Company
- hackajob
Industries
Software Development
Categories
Engineering
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Staff Frontend Engineer
2026-05-28
Full-time
Not Applicable
Ireland
Software Development
Engineering
View Job Details
Related
Software Development Engineer - Kubernetes Service Mesh
2026-05-28
Full-time
Not Applicable
Ireland
Software Development
Engineering
View Job Details
Related
Cloud Software Engineer
2026-05-22
Full-time
Not Applicable
Lithuania
Software Development
Engineering