DevOps Engineer - AI Infrastructure & GPU Orchestration

NEXUS

United Arab Emirates · Full-time · Entry

Company Description

NEXUS is revolutionizing the data center industry with the first AI-native Data Center Operating System. Addressing the growing complexity of AI-driven workloads and infrastructure, our platform unifies DCIM, APM, FinOps, Kubernetes orchestration, AI workload management, and full-stack observability into one intelligent, real-time system.With cutting-edge predictive intelligence and automated remediation, the platform ensures optimized performance, cost efficiency, and seamless AI deployment. At NEXUS, we are shaping a future with autonomous infrastructure intelligence for smarter, more efficient decisions.

Role Description

This is a full-time hybrid role for a DevOps Engineer specializing in AI Infrastructure and GPU Orchestration. The DevOps Engineer will be responsible for building and maintaining scalable infrastructure, implementing infrastructure as code (IaC), developing automation scripts, streamlining continuous integration workflows, and managing Linux-based systems. The role also involves optimizing GPU clusters, collaborating with software developers, and ensuring high system performance to support innovative AI-driven workloads.

Key Responsibilities

GPU Workload Orchestration: Design and manage complex Kubernetes environments (EKS, AKS, GKE, or bare metal) specifically tuned for AI/ML workloads, including GPU scheduling, device plugins, and node affinity.
DCIM Integration: Build and maintain infrastructure pipelines that interface with Data Center Infrastructure Management (DCIM) systems to monitor power, cooling, and hardware health at the rack level.
Advanced APM & Telemetry: Implement deep Application Performance Monitoring (APM) and observability stacks (Prometheus, Grafana, Datadog) to track GPU utilization, memory bandwidth, and workload latency in real-time.
Infrastructure as Code (IaC): Architect and deploy scalable, multi-cloud and hybrid environments using Terraform or equivalents, ensuring our platform can deploy rapidly into diverse enterprise environments.
CI/CD for AI Infrastructure: Own the CI/CD pipelines (GitHub Actions, GitLab CI) that deliver our orchestration software, ensuring zero-downtime deployments for mission-critical AI systems.
Performance Tuning: Work closely with the core engineering team to optimize network routing, storage I/O, and compute resource allocation for heavy AI training and inference workloads.

Qualifications

Minimum 3-5 years of professional experience in DevOps, SRE, or Infrastructure Engineering, with a strong focus on high-performance computing or AI infrastructure.
Expert-level skills in Terraform,Ansible, or similar technologies and CI/CD automation, coupled with strong scripting abilities in Python, Go, or Bash.
Strong knowledge of Continuous Integration tools (e.g., Jenkins, GitHub Actions, GitLab CI/CD)
Background in System Administration and expertise in managing multi-OS-based environments
Understanding of GPU clusters and handling modern AI workloads
Deep, hands-on experience with Kubernetes, specifically managing stateful workloads, custom resource definitions (CRDs), and GPU node provisioning.
Proven ability to design and implement comprehensive APM and telemetry solutions for complex, distributed systems.
Understanding of data center operations, including power, thermal management, and hardware-level monitoring.
Multi-cloud infrastructure experience is a plus
Ability to troubleshoot and optimize performance across complex infrastructure
Strong problem-solving abilities and a collaborative mindset

Key Skills

Ranked by relevance

ai kubernetes devops continuous integration gitlab cloud cicd infrastructure as code system administration prometheus gitlab ci terraform jenkins grafana storage datadog python linux nexus eks

Related Jobs

3 roles aligned with this opportunity

View all jobs

DevOps Engineer

2026-05-27

Full-time

Associate

Argentina

Software Development

Engineering

AI/ML Engineer

2026-05-27

Full-time

Not Applicable

United Kingdom

Technology

Engineering

Staff AI Engineer - 2543

2026-05-20

Full-time

Not Applicable

Brazil

Technology

Engineering

🇦🇪

Country Guide

United Arab Emirates

Tax-friendly regional tech hub

Posted: May 08, 2026
Type: Full-time
Level: Entry
Location: Dubai
Company: NEXUS

Industries

Technology Information Internet

Related Jobs

3 roles aligned with this opportunity

View all jobs

DevOps Engineer

2026-05-27

Full-time

Associate

Argentina

Software Development

Engineering

AI/ML Engineer

2026-05-27

Full-time

Not Applicable

United Kingdom

Technology

Engineering

Staff AI Engineer - 2543

2026-05-20

Full-time

Not Applicable

Brazil

Technology

Engineering

DevOps Engineer - AI Infrastructure & GPU Orchestration

Key Skills

Related Jobs

DevOps Engineer

AI/ML Engineer

Staff AI Engineer - 2543

Related Jobs

DevOps Engineer

AI/ML Engineer

Staff AI Engineer - 2543

Cookie Settings