-
RP International

productSite Reliability Engineer

RP International
United Arab Emirates · Full-time · Mid-Senior

We are looking for an experienced Product Site Reliability Engineer (SRE) to help ensure the performance, scalability, and reliability of our customer-facing products and platforms. As a critical link between product development and operations, the SRE designs resilient systems, automates workflows, and builds observability into the product lifecycle to enable fast-paced innovation without compromising stability.


Key Responsibilities

System Reliability & Performance

  • Ensure availability, latency, scalability, and overall system health aligns with SLAs and SLOs.
  • Continuously improve monitoring, alerting, and observability capabilities.

Incident Management

  • Lead root cause analysis and conduct blameless postmortems.
  • Develop and maintain incident response playbooks to reduce MTTD and MTTR.

Automation & Tooling

  • Automate operational tasks to reduce manual work and improve efficiency.
  • Build and maintain CI/CD pipelines and infrastructure as code (IaC) for seamless product delivery.

Collaboration with Product & Engineering

  • Work closely with engineering teams to embed reliability into product design.
  • Promote best practices such as chaos testing, capacity planning, and progressive deployment strategies (blue/green, canary releases).

Continuous Improvement

  • Define, measure, and track key reliability metrics (SLIs, SLOs, error budgets).
  • Identify and implement infrastructure and architectural improvements to enhance system resilience.


Required Skills & Experience

Technical Skills

  • Deep knowledge of cloud platforms (AWS, GCP, or Azure).
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Proficiency in Infrastructure as Code tools (Terraform, Ansible, or similar).
  • Expertise in CI/CD tools (e.g., Jenkins, GitHub Actions, GitLab CI).
  • Familiarity with observability and monitoring tools (Prometheus, Grafana, Datadog, New Relic).
  • Strong scripting and programming skills (Python, Go, Bash, or similar).
  • Understanding of distributed systems, networking, and database reliability (SQL/NoSQL).


Professional Skills

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Production Engineering.
  • Strong analytical and problem-solving mindset.
  • Excellent communication and collaboration skills across cross-functional teams.
  • Demonstrated experience in incident management and conducting postmortems.

Key Skills

Ranked by relevance

infrastructure as code cicd incident response containerization prometheus terraform jenkins ansible grafana datadog python docker devops gitlab cloud bash aws gcp
Login to Apply
Posted
Aug 20, 2025
Type
Full-time
Level
Mid-Senior
Location
Abu Dhabi Emirate

Industries

Staffing Recruiting

Categories

Engineering Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
RP International
Related

Junior Developer

2025-10-15

Full-time
Associate
United Arab Emirates
Staffing
Engineering
View Job Details
Code Compass 🧭
Related

AI Software Engineer (m/f/d) - Berlin

2026-05-21

Full-time
Mid-Senior
Germany
Staffing
Information Technology
View Job Details
Signify Technology
Related

Scala Software Developer

2026-05-21

Full-time
Not Applicable
Germany
Staffing
Engineering