Site Reliability Engineer (SRE)
From designing fault-tolerant architectures to leading incident responses, you’ll have the freedom to
shape how we deliver stable, secure, and high-performance banking services.
About the Role
We’re looking for a talented Site Reliability Engineer (SRE) to keep our systems running smoothly,
reliably, and at scale. Through smart automation, deep observability, and a calm head in a crisis, you’ll
help us balance speed, compliance, and stability, working alongside DevOps, Cloud, Quality
Engineering, and Product teams to drive continuous improvements in performance, security, and
resilience.
You’ll play a key role in enhancing reliability, accelerating delivery, and ensuring seamless digital
experiences for ADCB customers.
This role reports directly to our Lead SRE / Tribe Executive Manager.
What You Will Be Doing
• Define and implement SLIs / SLOs and error budgets for business-critical digital
banking services.
• Build actionable observability (metrics, logs, traces, dashboards, and alerts) using
Dynatrace, Prometheus, Grafana, and ELK, while reducing alert fatigue.
• Leverage AI-driven insights and anomaly detection (Dynatrace Davis AI or equivalent
AIOps platform) to proactively predict and resolve reliability issues before impact.
• Lead incident management — from on-call triage and root-cause analysis to blameless
postmortems with actionable follow-ups.
• Improve deployment safety with robust rollout / rollback strategies, canary and blue-
green deployments, and production readiness reviews.
• Support and optimize microservices-based architectures, ensuring service reliability,
scalability, and inter-service resilience.
• Conduct capacity planning, performance tuning, and resilience testing, optimizing for
both reliability and cost efficiency.
• Automate operational toil — from runbooks and remediation scripts to proactive health
checks and self-healing workflows.
• Collaborate with DevOps to embed reliability gates and validations into CI / CD
pipelines (GitHub Actions, Jenkins, GitLab CI / CD or Azure DevOps).
• Own and evolve the observability and AIOps stack, driving intelligent automation and
predictive alerting capabilities.
• Maintain high-quality documentation, playbooks, and operational standards across
environments.
• Ensure operational compliance and security alignment with internal controls and
regulatory standards.
• Analyze system performance, availability, and cost data to continually optimize
operations.
• Provide reliability support and escalation guidance for critical production systems
during major incidents.
Experience and Qualifications
• 5+ years of experience in SRE or DevOps roles, building and managing large-scale,
high-availability systems across banking, fintech, e-commerce, or other data-intensive
digital ecosystems.
• Bachelor’s degree in Computer Science or equivalent technical experience.
• Strong experience with Linux environments and performance troubleshooting.
• Proven expertise in Terraform and Infrastructure as Code (IaC) methodologies.
• Proficiency with Kubernetes and container orchestration in microservices
environments.
• Hands-on experience with AWS (preferred); exposure to Azure or GCP is an advantage.
• Deep knowledge of Dynatrace (AIOps, Davis AI), Prometheus, Grafana, and the ELK
stack.
• Experience implementing AI / ML-driven reliability or automation solutions (AIOps,
anomaly detection, predictive alerting).
• Practical understanding of CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI / CD
or Azure DevOps).
• Experience with Kafka, RabbitMQ, Redis, Aurora, and RDS databases.
• Strong scripting or programming skills in Python, Bash, or Go. The Ideal Candidate
• Organized, structured, and meticulous in approach.
• Experienced in cross-functional collaboration and working with distributed teams.
• Strong analytical mindset with excellent troubleshooting skills for complex production
systems.
• Calm and composed communicator under pressure, capable of leading during high-
impact incidents.
• Proactive problem-solver who anticipates issues and drives preventive improvements.
• Passionate about AI-driven automation, observability, and reliability engineering.
• Continuously learning, keeping up-to-date with cloud-native, microservices, and SRE
best practices.
• A collaborative and adaptable team player who thrives in a fast-paced, regulated
environment and is passionate about building reliable, scalable systems that empower
digital banking innovation.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Java DevOps Engineer (m/w/d)
2026-05-22
Senior Machine Learning Engineer
2026-06-04
Artificial Intelligence Engineer
2026-05-29
- Posted
- Nov 06, 2025
- Type
- Contract
- Level
- Mid-Senior
- Location
- Abu Dhabi Emirate
- Company
- Ampstek
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Java DevOps Engineer (m/w/d)
2026-05-22
Senior Machine Learning Engineer
2026-06-04
Artificial Intelligence Engineer
2026-05-29