-
ai71

Senior Site Reliability Engineer

ai71
United Arab Emirates · Full-time · Mid-Senior

Site Reliability Engineer

Location: Abu Dhabi, UAE

Company: AI71


About us:


AI71 is an applied research team dedicated to creating helpful and responsible AI agents for knowledge workers. Working closely with our industry partners, our cross-functional teams of AI experts build products grounded in the cutting-edge research of our colleagues from the Technology Innovation Institute (TII).


We are seeking a highly motivated and skilled DevOps/Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a passion for building, deploying, and maintaining scalable, reliable systems and infrastructure. You will work closely with development teams, ensuring smooth deployment pipelines, system stability, and operational efficiency.


Key Responsibilities:

  1. Infrastructure Automation & Management - Design, implement, and maintain CI/CD pipelines to streamline development workflows.
  2. Design and implement scalable infrastructure for AI model deployment and management
  3. Automate infrastructure provisioning and management using tools like Terraform, Ansible, or CloudFormation.
  4. Optimize cloud-based and on-premises resources to improve system scalability and cost efficiency.
  5. Manage and optimize queuing systems and real-time streaming architectures
  6. System Reliability & Monitoring - Monitor and troubleshoot production systems to maintain uptime and performance.
  7. Implement robust logging and alerting solutions using tools like Prometheus, Grafana, ELK stack, or similar.
  8. Implement comprehensive monitoring for both system metrics and ML model performance
  9. Conduct root cause analyses and post-mortem reviews to improve system reliability.
  10. Collaboration & Support - Work with development and QA teams to integrate new features into production environments seamlessly.
  11. Advocate for best practices in system architecture, security, and performance optimization.
  12. Provide on-call support for critical production systems as part of a rotation schedule.
  13. Security & Compliance - Ensure infrastructure meets security and compliance requirements (e.g., SOC2, ISO27001).
  14. Manage secrets and credentials securely using tools like Vault or AWS Secrets Manager.


Required Qualifications:

  • Bachelor’s degree in computer science, Engineering, or a related field (or equivalent experience).
  • Strong proficiency in at least one scripting language (e.g., Python, Bash, or Go).
  • Hands-on experience with cloud platforms like AWS, Azure, or Google Cloud.
  • Proficiency with containerization and orchestration tools (Docker, Kubernetes).
  • Experience with CI/CD tools such as AzureDevOps, Jenkins, GitLab CI/CD, or CircleCI.Knowledge of monitoring and observability tools (e.g., Prometheus, Datadog, or New Relic, Grafana, PagerDuty). Understanding of networking concepts (DNS, load balancing, firewalls).
  • Understanding of streaming architectures for real-time AI applications


Preferred Qualifications:

  • Experience with Infrastructure as Code (IaC) tools like Terraform or Pulumi.
  • Knowledge of service mesh technologies (e.g., Istio, Linkerd).
  • Familiarity with database administration and scaling (VectorDBs, SQL and NoSQL).
  • Previous experience in a similar role in a high-traffic production environment.


Why Join Us?

  • Opportunity to work on cutting-edge technology and challenging problems.
  • Collaborative work environment that values innovation and growth.
  • Competitive salary, benefits, and learning opportunities.

Key Skills

Ranked by relevance

ai cicd prometheus terraform grafana cloud aws infrastructure as code containerization pagerduty jenkins ansible datadog python docker gitlab istio vault bash sql elk dns
Login to Apply
Posted
Feb 06, 2025
Type
Full-time
Level
Mid-Senior
Location
Abu Dhabi
Company
ai71

Industries

Technology Information Internet

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
smartclip
Related

Senior Software Engineer API (f/m/d) - Node.js, SQL

2026-05-28

Full-time
Mid-Senior
Germany
Technology
Information Technology
View Job Details
Block Labs
Related

Senior Frontend Engineer

2026-05-28

Full-time
Not Applicable
Ireland
Technology
Engineering
View Job Details
FUNKE
Related

Fullstack Engineer - TypeScript & AWS (m/w/d)

2026-05-21

Full-time
Associate
Germany
Book
Analyst