-
TAT IT Technolgies
View all jobs
AI / ML Ops Engineer (Infrastructure, Monitoring & Deployment)
United Arab Emirates
· Full-time
·
Mid-Senior
We have an urgent requirement for AI / ML Ops Engineer (Infrastructure, Monitoring & Deployment) is required for one of our clients in Abu Dhabi.
Core Responsibilities
Core Responsibilities
- Manage HGX nodes (OS, drivers, GPU allocation)
- Set up and manage OpenShift/K8s clusters
- Deploy models to inference servers (Triton, TensorRT, etc.)
- Automate fine-tuning pipelines (PyTorch/TensorFlow)
- Handle CI/CD for models (training -> serving) Basic scripting (Python/Bash) for ops automation
- Manage artifacts (model checkpoints, fine-tuned versions)
- Validate fine-tuned models (accuracy, fairness, drift)
- Monitor model behavior in production
- Alert on anomalies
- Manage model registry (track model versions, fine-tuning metadata)
- Kubernetes (mandatory)
- OpenShift (bonus)
- DevOps (CI/CD)
- Python
- Torch/TensorFlow familiarity
- Triton Server or similar deployment tool
- Triton Inference Server
- MLFlow/KubeFlow
- Understanding of AI model validation
- monitoring tools (Prometheus, Grafana)
- basic ML performance metrics
- good scripting skills
Key Skills
Ranked by relevance
prometheus
server
cicd
ai
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Site Reliability Engineer (SRE) in Wealth, Trading / Brokerage (Fintech Domain)
2026-05-05
Contract
Mid-Senior
United Arab Emirates
Technology
Engineering
View Job Details
Related
AWS Devops Engineer –with experience in AI solutions in banking domain
2026-05-20
Contract
Mid-Senior
United Arab Emirates
Technology
Engineering
View Job Details
Related
AWS Solution Architect
2026-05-16
Contract
Mid-Senior
United Arab Emirates
Technology
Engineering
Login to Apply
- Posted
- May 09, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Abu Dhabi
- Company
- TAT IT Technolgies
Industries
Technology
Information
Internet
Categories
Engineering
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Site Reliability Engineer (SRE) in Wealth, Trading / Brokerage (Fintech Domain)
2026-05-05
Contract
Mid-Senior
United Arab Emirates
Technology
Engineering
View Job Details
Related
AWS Devops Engineer –with experience in AI solutions in banking domain
2026-05-20
Contract
Mid-Senior
United Arab Emirates
Technology
Engineering
View Job Details
Related
AWS Solution Architect
2026-05-16
Contract
Mid-Senior
United Arab Emirates
Technology
Engineering