We are looking for a hands‑on AI/ML Engineer to own and execute MLOps, evaluation, and deployment practices for a production AI platform built on LLMs, agentic workflows, vision, and voice AI.
This role is strongly execution‑focused. You will work across the entire AI lifecycle—from evaluation and observability to RLHF, deployment in constrained environments, and production readiness sign‑off—while collaborating with internal teams and directing external vendors.
Key Responsibilities
MLOps & Deployment Ownership
- Define and oversee MLOps practices including:
- Agent and model versioning
- Evaluation tracking
- Deployment gating and promotion workflows
- Rollback and recovery procedures
- Collaborate with internal stakeholders and external delivery teams to ensure reliable production deployments.
Evaluation, Monitoring & Observability
- Own the evaluation framework for:
- LLM‑based agents
- RAG pipelines
- Vision Language Models (VLMs)
- Voice AI models (OpenAI Whisper, Chatterbox, Vibe Voice, or equivalent)
- Define and maintain:
- Offline evaluation methodologies
- Online monitoring and regression detection thresholds
- Human‑in‑the‑loop review processes
- Set up and manage AI observability tooling (e.g., Langfuse or equivalent) across all environments.
Performance Reporting & Insights
- Build and maintain product performance reporting, covering:
- Model accuracy and agent effectiveness
- Latency and cost‑per‑interaction
- Bias, quality trends, and stability across markets
- Provide clear technical insights to non‑technical stakeholders.
RLHF & Continuous Improvement
- Design and oversee RLHF (Reinforcement Learning from Human Feedback) pipelines:
- Data collection and feedback ingestion
- Annotation guidelines and reward criteria
- Feedback loops for continuous improvement
- Direct implementation by external teams and monitor quality improvements over time.
Agent Memory Systems
- Own the design and validation of agent memory architectures, including:
- Short‑term context windows
- Long‑term retrieval
- Episodic memory across sessions
- Memory lifecycle policies (retention, expiry, cost control)
- Define test criteria to ensure consistency across deployment environments.
Model Benchmarking & Optimization
- Evaluate and benchmark VLMs and voice models under constrained infrastructure.
- Recommend optimization strategies:
- Quantization
- Distillation
- Runtime and model selection per jurisdiction
- Validate production readiness in on‑prem or sovereign environments.
Production Readiness & Rollouts
- Oversee production deployments executed by vendor teams.
- Run final validation checks and sign off on production readiness.
- Document deployment patterns, baselines, and environment‑specific configurations to accelerate future market rollouts.
Privacy & Data Residency
- Evaluate and recommend privacy‑preserving deployment patterns, including:
- On‑device inference
- Data isolation
- Local or sovereign model hosting
- Ensure compliance with jurisdictional data residency requirements.
Technical Requirements
- 3–5 years of experience in applied AI, LLMOps, MLOps, or similar technical AI roles.
- Strong Python expertise:
- Type hints, async programming, FastAPI
- Code reviews, evaluation scripts, prototyping pipelines
- Experience with LLM application patterns:
- RAG pipelines
- Prompt engineering
- Multi‑agent orchestration
- Solid background in supervised ML (scikit‑learn, XGBoost, LightGBM, or equivalent).
- Strong understanding of MLOps fundamentals:
- Model versioning
- Experiment tracking
- CI/CD deployment pipelines
- Monitoring and rollback strategies
- Hands‑on experience with:
- RLHF or human‑feedback‑driven improvement loops
- LLM/VLM/voice AI evaluation frameworks
- Agent memory architectures
- Working knowledge of:
- Vision Language Models (VLMs)
- Voice AI systems across latency, language, and hosting constraints
- Understanding of model optimisation techniques (quantization, distillation, ONNX).
- Experience using AI observability tools (Langfuse, LangSmith, or equivalent).
- Comfortable directing or overseeing external/vendor engineering teams.
- Ability to work independently in ambiguous and non‑standard infrastructure environments.
Good to Have
- Experience with sovereign cloud or government‑regulated infrastructure.
- Familiarity with agentic AI frameworks (LangChain/LangGraph, CrewAI, PydanticAI).
- Exposure to federated learning or privacy‑preserving inference.
- Background in healthcare, insurance, or regulated domains.
- Experience building performance dashboards for non‑technical audiences.
Immediate Joiner - Required
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Generative AI Engineer
2026-06-01
Senior Software Engineer (Infrastructure)
2026-05-27
Full Stack Developer
2026-05-27
- Posted
- May 05, 2026
- Type
- Contract
- Level
- Mid-Senior
- Location
- Dubai
- Company
- Hays
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Generative AI Engineer
2026-06-01
Senior Software Engineer (Infrastructure)
2026-05-27
Full Stack Developer
2026-05-27