DevOps SRE Engineer - Observability & Automation -Bank

TAT IT Technolgies

United Arab Emirates · Contract · Mid-Senior

Urgent requirement for DevOps SRE Engineer - Observability & Automation is required for our banking clients in Abu Dhabi ,UAE

Strong experience in Kafka, RabbitMQ, Redis, RDS/Aurora ---Must
Strong experience in observability (metrics, logs, traces, dashboards, and alerts) is Must

Strong experience in Kubernetes, Docker, container orchestration, microservices support is Must

Strong experience in Terraform, IaC practice is MUST

Strong experience in Linux environments and performance troubleshooting is MUST

Strong experience in Banking is MUST

We’re looking for a talented Site Reliability Engineer (SRE) to keep our systems running smoothly, reliably, and at scale. Through smart automation, deep observability, and a calm head

in a crisis, you’ll help us balance speed, compliance, and stability, working alongside DevOps,Cloud, Quality Engineering, and Product teams to drive continuous improvements inperformance, security, and resilience..

Define and implement SLIs / SLOs and error budgets for business-critical digital banking

services.

Build actionable observability (metrics, logs, traces, dashboards, and alerts) using Dynatrace,

Prometheus, Grafana, and ELK, while reducing alert fatigue.

Leverage AI-driven insights and anomaly detection (Dynatrace Davis AI or equivalent AIOps

platform) to proactively predict and resolve reliability issues before impact.

Lead incident management — from on-call triage and root-cause analysis to blameless

postmortems with actionable follow-ups.

Improve deployment safety with robust rollout / rollback strategies, canary and blue-green

deployments, and production readiness reviews.

Support and optimize microservices-based architectures, ensuring service reliability,

scalability, and inter-service resilience.

Conduct capacity planning, performance tuning, and resilience testing, optimizing for both

reliability and cost efficiency.

Automate operational toil — from runbooks and remediation scripts to proactive health checks

and self-healing workflows.

Collaborate with DevOps to embed reliability gates and validations into CI / CD pipelines

(GitHub Actions, Jenkins, GitLab CI / CD or Azure DevOps).

Own and evolve the observability and AIOps stack, driving intelligent automation and predictive

alerting capabilities.

Maintain high-quality documentation, playbooks, and operational standards across

environments.

Ensure operational compliance and security alignment with internal controls and regulatory

standards.

Analyze system performance, availability, and cost data to continually optimize operations.
Provide reliability support and escalation guidance for critical production systems during major

incidents.

5+ years of experience in SRE or DevOps roles, building and managing large-scale,

high-availability systems across banking, fintech, e-commerce, or other data-intensive digital

ecosystems.

Bachelor’s degree in Computer Science or equivalent technical experience.
Strong experience with Linux environments and performance troubleshooting.
Proven expertise in Terraform and Infrastructure as Code (IaC) methodologies.
Proficiency with Kubernetes and container orchestration in microservices environments.
Hands-on experience with AWS (preferred); exposure to Azure or GCP is an advantage.
Deep knowledge of Dynatrace (AIOps, Davis AI), Prometheus, Grafana, and the ELK stack.
Experience implementing AI / ML-driven reliability or automation solutions (AIOps, anomaly

detection, predictive alerting).

Practical understanding of CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI / CD or Azure

DevOps).

Experience with Kafka, RabbitMQ, Redis, Aurora, and RDS databases.
Strong scripting or programming skills in Python, Bash, or Go.

Skills: reliability,devops,sre,bank,automation

Key Skills

Ranked by relevance

ai microservices kubernetes gitlab ci terraform rabbitmq jenkins grafana devops gitlab redis kafka linux elk infrastructure as code prometheus python docker bash aws gcp

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Full Stack Engineer (Gen & Agentic AI )experience in Mlops & FinOps

2026-07-02

Contract

Mid-Senior

United Arab Emirates

Technology

Engineering

Cloud Platform Engineer – Azure Zone ,Terraform ,AI Infrastructure & Finops

2026-06-16

Contract

Mid-Senior

United Arab Emirates

Technology

Engineering

Software Engineer (Backend) at Letly

2026-07-09

Full-time

Not Applicable

United Kingdom

Technology

Engineering

🇦🇪

Country Guide

United Arab Emirates

Tax-friendly regional tech hub

Posted: Oct 17, 2025
Type: Contract
Level: Mid-Senior
Location: Abu Dhabi
Company: TAT IT Technolgies

Industries

Technology Information Internet

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Full Stack Engineer (Gen & Agentic AI )experience in Mlops & FinOps

2026-07-02

Contract

Mid-Senior

United Arab Emirates

Technology

Engineering

Cloud Platform Engineer – Azure Zone ,Terraform ,AI Infrastructure & Finops

2026-06-16

Contract

Mid-Senior

United Arab Emirates

Technology

Engineering

Software Engineer (Backend) at Letly

2026-07-09

Full-time

Not Applicable

United Kingdom

Technology

Engineering

DevOps SRE Engineer - Observability & Automation -Bank

Key Skills

Related Jobs

Senior Full Stack Engineer (Gen & Agentic AI )experience in Mlops & FinOps

Cloud Platform Engineer – Azure Zone ,Terraform ,AI Infrastructure & Finops

Software Engineer (Backend) at Letly

Related Jobs

Senior Full Stack Engineer (Gen & Agentic AI )experience in Mlops & FinOps

Cloud Platform Engineer – Azure Zone ,Terraform ,AI Infrastructure & Finops

Software Engineer (Backend) at Letly

Cookie Settings