Senior DevOps Engineer
About the Role:
We are seeking a highly motivated and skilled DevOps/Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a passion for building, deploying, and maintaining scalable, reliable systems and infrastructure. You will work closely with development teams, ensuring smooth deployment pipelines, system stability, and operational efficiency.
Key Responsibilities:
Infrastructure Automation & Management
- Design, implement, and maintain CI/CD pipelines to streamline development workflows.
- Design and implement scalable infrastructure for AI model deployment and management
- Automate infrastructure provisioning and management using tools like Terraform, Ansible, or CloudFormation.
- Optimize cloud-based and on-premises resources to improve system scalability and cost efficiency.
- Manage and optimize queuing systems and real-time streaming architectures
System Reliability & Monitoring
- Monitor and troubleshoot production systems to maintain uptime and performance.
- Implement robust logging and alerting solutions using tools like Prometheus, Grafana, ELK stack, or similar.
- Implement comprehensive monitoring for both system metrics and ML model performance
- Conduct root cause analyses and post-mortem reviews to improve system reliability.
Collaboration & Support
- Work with development and QA teams to integrate new features into production environments seamlessly.
- Advocate for best practices in system architecture, security, and performance optimization.
- Provide on-call support for critical production systems as part of a rotation schedule.
Security & Compliance:
- Ensure infrastructure meets security and compliance requirements (e.g., SOC2, ISO27001).
- Manage secrets and credentials securely using tools like Vault or AWS Secrets Manager.
Required Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- Strong proficiency in at least one scripting language (e.g., Python, Bash, or Go).
- Hands-on experience with cloud platforms like AWS, Azure, or Google Cloud.
- Proficiency with containerization and orchestration tools (Docker, Kubernetes).
- Experience with CI/CD tools such as AzureDevOps, Jenkins, GitLab CI/CD, or CircleCI.
- Knowledge of monitoring and observability tools (e.g., Prometheus, Datadog, or New Relic, Grafana, PagerDuty).
- Understanding of networking concepts (DNS, load balancing, firewalls).
- Understanding of streaming architectures for real-time AI applications
Preferred Qualifications:
- Experience with Infrastructure as Code (IaC) tools like Terraform or Pulumi.
- Knowledge of service mesh technologies (e.g., Istio, Linkerd).
- Familiarity with database administration and scaling (VectorDBs, SQL and NoSQL).
- Previous experience in a similar role in a high-traffic production environment.
Why Join Us?
- Opportunity to work on cutting-edge technology and challenging problems.
- Collaborative work environment that values innovation and growth.
- Competitive salary, benefits, and learning opportunities.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer (AWS) con orientación Backend
2026-05-27
DevOps Engineer
2026-05-27
DevOps Engineer (AWS)
2026-05-27
- Posted
- Mar 10, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Abu Dhabi Emirate
- Company
- Technology Innovation Institute
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer (AWS) con orientación Backend
2026-05-27
DevOps Engineer
2026-05-27
DevOps Engineer (AWS)
2026-05-27