Kingston Stanley
Site Reliability Engineer
Kingston StanleyUnited Arab Emirates2 days ago
Full-timeInformation Technology

Lead Site Reliability Engineer – Dubai, Full Time


Our client, who are a technology organization that are leaders within their field, are seeking an experienced Lead Site Reliability Engineer to join their team.


As Lead Site Reliability Engineer, you will take ownership of their production environment, which processes hundreds of billions of data points every year. This is a senior opportunity for a Kubernetes expert with strong experience in distributed data systems, automation, and infrastructure security.


Key Responsibilities:

  • Design and manage scalable Kubernetes-based infrastructure.
  • Automate provisioning, configuration management, and deployments.
  • Optimise and administer distributed databases including Cassandra, ScyllaDB, PostgreSQL, MongoDB, and ElasticSearch.
  • Build and maintain advanced monitoring, logging, and alerting systems.
  • Manage Linux-based systems with a focus on security, patching, and hardening.
  • Develop and test disaster recovery and business continuity strategies.
  • Lead performance tuning, capacity planning, and cost optimisation.
  • Act as the highest escalation point for complex infrastructure incidents.
  • Collaborate with development teams to refine CI/CD pipelines.
  • Produce clear documentation of systems, processes, and architecture.
  • Provide mentorship and technical guidance to the wider technology team.


Qualifications & Experience:

  • Degree in Computer Science, Engineering, or related field.
  • 8+ years in SRE, DevOps, or Systems Engineering across large-scale production environments.
  • Expert in Kubernetes cluster design, scaling, and security.
  • Proven experience with both NoSQL and relational databases.
  • Strong Linux administration skills (Ubuntu/SUSE) with system hardening expertise.
  • Proficiency in scripting (Bash, Python, Go) and Infrastructure as Code tools (Terraform, Ansible, Pulumi).
  • Knowledge of load balancing, networking, and storage solutions.


Nice to Have:

  • Strong understanding of infrastructure and data security.
  • Hands-on disaster recovery and resilience planning.
  • Advanced database tuning and recovery skills.
  • Familiarity with Go-based application support.
  • Excellent troubleshooting and analytical skills.
  • Proactive, accountable, and effective in high-pressure environments.
  • Strong collaboration with cross-functional engineering teams.


Salary: Competitive

Benefits: Medical Insurance and Visa (family), flight ticket allowance (employee only)

Key Skills

Ranked by relevance