-
MultiBank Group

SRE Lead

MultiBank Group
United Arab Emirates · Full-time · Mid-Senior

Welcome to MultiBank Group; a global financial pioneer established in 2005 in California and now proudly headquartered in Dubai, UAE. We excel in providing cutting-edge trading technology, unparalleled liquidity, and exceptional customer service, offering an extensive range of financial products such as Forex, Metals, Shares, Indices, Commodities, and Cryptocurrency CFDs.


Join our thriving community of over 1 million clients across 90 countries, contributing to a daily trading volume exceeding US$ 18.1 billion. As a heavily regulated (16+ financial regulators across 5 continents), award-winning, and reliable financial institution, MultiBank Group is devoted to innovation, excellence, and empowering our clients to achieve their financial goals. Seize the opportunity to work with a rapidly growing, world-class team spanning more than 20 countries, driven by innovation, collaboration, and customer focus.


Role Overview

The SRE Lead is responsible for managing the daily operations of the SRE team and overseeing the reliability, scalability, and performance of the infrastructure and services. This role involves managing the team’s day-to-day activities, defining strategies for improving system reliability, and ensuring the team adopts best practices in automation, incident response, and infrastructure management.


Key Responsibilities


Leadership and Team Management

  • Manage the daily operations of the SRE team, including scheduling, assignment of tasks, and performance tracking
  • Provide technical guidance, mentorship, and feedback to individual team members and promote feedback exchange (utilize feedback tools when possible)
  • Conduct regular performance reviews, set goals, and develop individual growth plans in coordination with TM Specialist for SRE team members


SRE Practices

  • Implement SRE strategy, processes, and practices defined by the organization, ensuring that they are adhered to within the team


Strategic Planning and Roadmap Development

  • Develop and execute a roadmap for improving observability, scalability, disaster recovery, and performance optimization


System Reliability and Incident Management

  • Oversee system health, ensuring a high level of reliability, uptime, and performance across production environments
  • Lead incident management efforts, including response, resolution, and post-mortem reviews, ensuring root causes are identified and mitigated
  • Drive the development of incident response protocols and on-call rotations to ensure 24/7 support and quick resolution of critical issues


Automation and Infrastructure Optimization

  • Drive the adoption and scaling of automation practices across the team, reducing manual tasks related to deployments, scaling, and monitoring
  • Ensure the team implements Infrastructure as Code (IaC) and continuously refines CI/CD pipelines to support efficient, repeatable, and reliable infrastructure management
  • Lead initiatives for optimizing cloud infrastructure and resource usage, ensuring performance meets business needs while optimizing costs


Production Release Support

  • Oversee and support the deployment of new features and updates to production, ensuring minimal downtime and maximum reliability
  • Collaborate with development and management teams to ensure a smooth and efficient release process, adhering to established release procedures
  • Monitor production environments during and after releases, ready to address any issues or rollbacks if necessary


Monitoring, Observability, and Performance Tuning

  • Oversee the development and maintenance of monitoring and observability systems, ensuring they provide real-time insights into system performance and reliability
  • Ensure that system metrics are regularly reviewed and that performance-tuning efforts are prioritized based on system bottlenecks and resource usage patterns
  • Work with development teams to ensure observability is integrated into the design and development of applications and services


Cross-Functional Collaboration and Communication

  • Serve as the point of contact for reliability-related matters, providing regular updates on system health, incident trends, and improvement plans.
  • Foster a culture of shared responsibility between SRE and development teams, encouraging collaboration on building reliable, scalable, and performant systems


Continuous Improvement and Innovation

  • Promote the adoption of new technologies, frameworks, and tools that enhance system resilience, scalability, and automation
  • Regularly review and refine processes to increase the efficiency and effectiveness of incident response, system monitoring, and infrastructure management


Security, Compliance, and Risk Management

  • Ensure that security best practices are integrated into all aspects of infrastructure management, including access control, vulnerability management, and data protection
  • Collaborate with security teams to ensure compliance with industry standards and regulations while maintaining system availability and performance
  • Proactively manage risks related to system reliability and availability, identifying and mitigating potential threats before they impact production environments


Reporting and Metrics

  • Define, track, and report on key metrics related to system performance, uptime, and incident response, providing insights to both the engineering team and leadership
  • Lead efforts to use data-driven insights for system improvements and to measure the impact of changes to reliability and performance
  • Present regular reports on the state of system reliability, key incidents, and ongoing improvement initiatives to leadership and stakeholders


Collaboration in Hiring

  • Participate in the hiring process for SRE, evaluating candidates and helping build a strong, capable team

Key Skills

Ranked by relevance

incident response infrastructure as code cryptocurrency cloud excel cicd
Login to Apply
Posted
Feb 14, 2025
Type
Full-time
Level
Mid-Senior
Location
Dubai

Industries

Financial Services Capital Markets Technology Information Media

Categories

Engineering Information Technology Product Management

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
MultiBank Group
Related

Site Reliability Engineer

2026-03-31

Full-time
Not Applicable
United Arab Emirates
Financial Services
Engineering
View Job Details
MultiBank Group
Related

Site Reliability Engineer

2026-02-03

Full-time
Not Applicable
United Arab Emirates
Financial Services
Engineering
View Job Details
MultiBank Group
Related

Data Analyst

2026-01-28

Full-time
Entry
United Arab Emirates
Financial Services
Analyst