We are seeking a highly skilled and experienced DevOps Engineer to join our team.
This role is ideal for someone with a strong background in automating and optimizing infrastructure, deploying scalable systems, and ensuring reliability across complex environments. You’ll play a critical role in shaping the tools, practices, and standards that drive our CI/CD pipelines, cloud infrastructure, and overall system performance. As a key member of our team, you’ll collaborate closely with engineering, operations, and product teams to ensure seamless delivery and high uptime of our services.
Responsibilities
Implement and Maintain CI/CD Pipelines
• Establish and maintain CI/CD pipelines to ensure smooth operation for AI and machine learning projects.
Ensure Robust Deployment Strategies
• Develop and implement reliable deployment strategies for AI models, with a focus on LLMs and Retrieval-Augmented Generation (RAG).
Monitor AI Application Performance
• Monitor the performance, reliability, and availability of deployed AI models to ensure optimal operation.
Collaborate with AI Research Teams
• Assist in transitioning machine learning models and algorithms into production environments efficiently.
Develop and Enforce Best Practices
• Develop and enforce best practices for version control, configuration management, and testing of AI-driven solutions.
Utilize MLOps Tools
• Leverage frameworks such as Kubeflow, MLflow, or TensorFlow Extended (TFX) to manage the ML lifecycle from experimentation to production.
Implement Monitoring Solutions
• Implement monitoring solutions for infrastructure metrics and AI model performance to enable proactive issue detection.
Participate in On-Call Rotations
• Participate in on-call rotations to ensure uptime and meet service-level objectives (SLOs).
Requirements
Bachelor’s Degree
• A degree in Computer Science, Engineering, or a related field, or equivalent experience.
Experience in DevOps or SRE
• Proven experience as a DevOps Engineer or Site Reliability Engineer (SRE), with a strong emphasis on software development and automation.
Expertise in LLM Deployment
• Proven expertise in managing and deploying Large Language Models (LLMs), particularly using Retrieval-Augmented Generation (RAG) techniques.
Proficiency in CI/CD Tools
• Hands-on experience with CI/CD tools such as Jenkins, GitLab CI, CircleCI, Travis CI, Bamboo, or TeamCity.
• Experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, or Chef.
Container Orchestration Knowledge
• Strong understanding of Kubernetes, Docker, and other container orchestration technologies.
Familiarity with MLOps
• Competence with MLOps tools to facilitate machine learning development and deployment.
Cloud Service Expertise
• Experience with cloud platforms such as AWS, GCP, or Azure, particularly in AI/ML environments.
Monitoring Tools Experience
• Skilled in monitoring tools like Prometheus, Grafana, and the ELK (Elasticsearch, Logstash, and Kibana) stack.
Python Knowledge
• Strong understanding of Python, particularly in the context of data science and machine learning.
Relevant Certifications
• Certifications in Kubernetes, AWS/GCP/Azure, or related technologies are a plus.
If you're passionate about automation, cloud infrastructure, and building reliable, scalable systems, and eager to work in a dynamic, fast-paced environment, we encourage you to apply!
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer
2026-06-18
Senior Machine Learning Engineer
2024-11-15
Senior Machine Learning Engineer
2024-11-09
- Posted
- Nov 09, 2024
- Type
- Full-time
- Level
- Entry
- Location
- Dubai
- Company
- Identity AI Labs
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer
2026-06-18
Senior Machine Learning Engineer
2024-11-15
Senior Machine Learning Engineer
2024-11-09