cander
Head of Infrastructure
canderUnited Arab Emirates3 hours ago
Full-timeRemote FriendlyInformation Technology

Head of Infrastructure

📍 Dubai, UAE

💰 55,000–60,000 AED per month + bonus + family medical and insurance


We are seeking an exceptional Head of Infrastructure to lead and evolve a global technology platform supporting large scale, distributed operations.

This role owns the reliability, scalability, security, and performance of production systems across cloud, edge, and microservices environments.

Reporting to the CTO and part of the Technology Leadership Team, you will inherit a mature, complex ecosystem and drive its evolution for global scale across 50+ markets.


Requirements

  • 12+ years in infrastructure, platform, or systems engineering
  • 5+ years in senior leadership roles such as Head of Infrastructure, SRE, or Platform Engineering
  • Proven experience managing large scale, distributed systems across cloud and edge
  • Strong background in microservices and real time data processing
  • Deep Kubernetes expertise in production, including multi cluster environments
  • Experience across multi cloud environments such as AWS, Azure, GCP
  • Track record managing high throughput messaging systems such as Kafka or RabbitMQ
  • Experience with stream processing frameworks such as Flink or Spark
  • Exposure to AI and ML infrastructure including GPU environments and model deployment
  • Experience operating across multiple regions, ideally APAC, Middle East, and Europe


Key Responsibilities

Platform Architecture and Deployment

  • Own end to end infrastructure across cloud and edge environments
  • Lead Kubernetes and container orchestration strategy across hybrid deployments
  • Define and execute multi cloud strategy across AWS, Azure, GCP and regional providers
  • Build infrastructure as code and automated deployment pipelines
  • Ensure compliance with global data residency and sovereignty requirements

Database and Storage

  • Manage a polyglot database environment across relational, time series, graph, and cache layers
  • Design scaling, replication, backup, and disaster recovery strategies
  • Drive zero downtime migrations and performance optimisation

Messaging and Microservices

  • Scale high throughput messaging systems handling large IoT data volumes
  • Own service governance, API management, and microservices communication
  • Implement monitoring, alerting, and capacity planning

Real Time and Data Processing

  • Lead stream processing infrastructure for real time data and analytics
  • Optimise latency and throughput across distributed systems
  • Manage batch processing and distributed task scheduling

AI and Compute Infrastructure

  • Own infrastructure for AI model training and real time inference
  • Design GPU and accelerator strategy for cost and performance
  • Support LLM deployment and simulation workloads

Frontend and Edge Delivery

  • Optimise global content delivery and WebGL performance
  • Manage CDN and caching strategies

IoT and Connectivity

  • Scale infrastructure supporting millions of connected devices
  • Manage edge gateways and protocol integrations
  • Implement device lifecycle management at scale

Observability and Reliability

  • Own monitoring, logging, tracing, and performance management
  • Establish incident response, on call, and SLA frameworks
  • Improve system reliability through proactive alerting

Security and Compliance

  • Implement secure infrastructure across all environments
  • Ensure compliance with global data protection regulations
  • Lead vulnerability management, disaster recovery, and audit readiness

AI Driven Operations

  • Embed AI into infrastructure operations and automation
  • Enable predictive monitoring, cost optimisation, and remediation

Leadership and Team Building

  • Build and lead a global infrastructure team across SRE, DevOps, Data, and Security
  • Create scalable team structures and clear ownership models
  • Drive a culture of reliability and continuous improvement

Stakeholder Collaboration

  • Partner with leadership on strategy, cost, and global expansion
  • Align infrastructure with business, legal, and commercial priorities


Technical Expertise

  • Kubernetes, container orchestration, and cluster management
  • Polyglot databases across relational, time series, graph, and distributed storage
  • Messaging and streaming systems focused on scale, latency, and reliability
  • Service governance, API management, and microservices architecture
  • Observability across metrics, logging, and distributed tracing
  • Infrastructure as code using Terraform, Ansible, or similar
  • CI/CD pipelines for distributed systems
  • Security including zero trust, network segmentation, and compliance frameworks
  • Strong understanding of IoT and edge computing environments


Preferred Background

  • Experience within IoT, PropTech, energy, or complex platform environments
  • Exposure to real time sensor data platforms and edge deployments
  • Experience with Chinese cloud platforms or global infrastructure expansion
  • Background supporting AI platforms including LLM deployment
  • Experience scaling infrastructure through high growth or international expansion
  • Exposure to 3D visualisation or simulation platforms

Key Skills

Ranked by relevance