-
NEXUS

DevOps Engineer - AI Infrastructure & GPU Orchestration

NEXUS
United Arab Emirates · Full-time · Entry

Company Description

NEXUS is revolutionizing the data center industry with the first AI-native Data Center Operating System. Addressing the growing complexity of AI-driven workloads and infrastructure, our platform unifies DCIM, APM, FinOps, Kubernetes orchestration, AI workload management, and full-stack observability into one intelligent, real-time system.With cutting-edge predictive intelligence and automated remediation, the platform ensures optimized performance, cost efficiency, and seamless AI deployment. At NEXUS, we are shaping a future with autonomous infrastructure intelligence for smarter, more efficient decisions.


Role Description

This is a full-time hybrid role for a DevOps Engineer specializing in AI Infrastructure and GPU Orchestration. The DevOps Engineer will be responsible for building and maintaining scalable infrastructure, implementing infrastructure as code (IaC), developing automation scripts, streamlining continuous integration workflows, and managing Linux-based systems. The role also involves optimizing GPU clusters, collaborating with software developers, and ensuring high system performance to support innovative AI-driven workloads.


Key Responsibilities

  • GPU Workload Orchestration: Design and manage complex Kubernetes environments (EKS, AKS, GKE, or bare metal) specifically tuned for AI/ML workloads, including GPU scheduling, device plugins, and node affinity.
  • DCIM Integration: Build and maintain infrastructure pipelines that interface with Data Center Infrastructure Management (DCIM) systems to monitor power, cooling, and hardware health at the rack level.
  • Advanced APM & Telemetry: Implement deep Application Performance Monitoring (APM) and observability stacks (Prometheus, Grafana, Datadog) to track GPU utilization, memory bandwidth, and workload latency in real-time.
  • Infrastructure as Code (IaC): Architect and deploy scalable, multi-cloud and hybrid environments using Terraform or equivalents, ensuring our platform can deploy rapidly into diverse enterprise environments.
  • CI/CD for AI Infrastructure: Own the CI/CD pipelines (GitHub Actions, GitLab CI) that deliver our orchestration software, ensuring zero-downtime deployments for mission-critical AI systems.
  • Performance Tuning: Work closely with the core engineering team to optimize network routing, storage I/O, and compute resource allocation for heavy AI training and inference workloads.


Qualifications

  • Minimum 3-5 years of professional experience in DevOps, SRE, or Infrastructure Engineering, with a strong focus on high-performance computing or AI infrastructure.
  • Expert-level skills in Terraform,Ansible, or similar technologies and CI/CD automation, coupled with strong scripting abilities in Python, Go, or Bash.
  • Strong knowledge of Continuous Integration tools (e.g., Jenkins, GitHub Actions, GitLab CI/CD)
  • Background in System Administration and expertise in managing multi-OS-based environments
  • Understanding of GPU clusters and handling modern AI workloads
  • Deep, hands-on experience with Kubernetes, specifically managing stateful workloads, custom resource definitions (CRDs), and GPU node provisioning.
  • Proven ability to design and implement comprehensive APM and telemetry solutions for complex, distributed systems.
  • Understanding of data center operations, including power, thermal management, and hardware-level monitoring.
  • Multi-cloud infrastructure experience is a plus
  • Ability to troubleshoot and optimize performance across complex infrastructure
  • Strong problem-solving abilities and a collaborative mindset


Key Skills

Ranked by relevance

ai kubernetes devops continuous integration gitlab cloud cicd infrastructure as code system administration prometheus gitlab ci terraform jenkins grafana storage datadog python linux nexus eks
Login to Apply
Posted
May 08, 2026
Type
Full-time
Level
Entry
Location
Dubai
Company
NEXUS

Industries

Technology Information Internet

Categories

Engineering Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
EPAM Systems
Related

DevOps Engineer

2026-05-27

Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
Kallikor
Related

AI/ML Engineer

2026-05-27

Full-time
Not Applicable
United Kingdom
Technology
Engineering
View Job Details
Starian
Related

Staff AI Engineer - 2543

2026-05-20

Full-time
Not Applicable
Brazil
Technology
Engineering