AI Infrastructure Engineer- Opus

AppliedAI

United Arab Emirates · Full-time · Mid-Senior

As an Opus AI Infrastructure Engineer, you will lead the optimization and scaling of AI pipelines that serve foundational models in live production environments. You will focus on evolving real-time and batch inference systems for reliability, low latency, and seamless integration with product logic. This senior engineering role operates at the core of AI delivery, requiring strong system design, infrastructure fluency, and a deep commitment to performance and operational excellence.

You will work across modern cloud environments and manage a diverse and evolving portfolio of LLMs, both proprietary and open-source. You will play a key role in evaluating model trade-offs, adapting to rapid model iteration, and ensuring smooth transitions as providers update APIs, capabilities, and service tiers. You will also coordinate directly with foundational model vendors to align roadmap requirements, performance issues, and deployment optimizations.

Key Responsibilities

AI Serving Pipeline Optimization

* Design, rewrite, and mature inference pipelines for real-time, streaming, and batch workloads

* Optimize throughput, latency, and reliability via architectural evolution and model-specific strategies

* Manage orchestration of heterogeneous LLMs with varying performance, cost, and response profiles

* Implement fallback logic, request routing, and intelligent retry systems for availability and graceful degradation

* Build tooling for profiling and benchmarking pipelines involving LLMs and agentic orchestration frameworks

* Adapt infrastructure and integrations to support rapidly changing LLM APIs, model versions, and provider behavior

* Design and deploy self-hosted LLM inference pipelines, including model loading, quantization, batching, and runtime optimization on GPU/TPU environments

Production Infrastructure & Runtime Efficiency

* Own the live AI execution layer: coordinate model calls, resource scheduling, and latency-critical paths

* Monitor and improve key metrics: latency, token throughput, error rates, and autoscaling responsiveness

* Deploy and scale LLM services across cloud environments (AWS, GCP, Azure, on-prem), optimizing for regional availability and regulatory constraints

* Ensure robust observability, failover, rollback, and health monitoring across all deployed models

* Collaborate with infra teams to maximize compute efficiency across CPU/GPU/TPU backends

Model Vendor Coordination & External Integrations

* Serve as a technical counterpart to foundational model providers, communicating product needs, debugging issues, and tracking performance updates

* Maintain high reliability across provider transitions, including model deprecations, quota shifts, and new capability rollouts

* Evaluate and experiment with emerging models across different providers, providing comparative benchmarks and integration plans

System Integration & Engineering Excellence

* Integrate pipelines cleanly with APIs, orchestration layers, and application logic

* Refactor legacy systems for modularity, observability, and performance

* Promote reusable, maintainable infrastructure via tooling and shared abstractions

* Uphold engineering standards through code reviews, performance audits, and technical mentorship

Qualifications

Education

* Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field

Experience

* 5+ years in backend, ML, or infrastructure engineering with a focus on live AI systems

* Demonstrated experience building and scaling real-time inference infrastructure

* Proven track record in latency optimization, fault tolerance, and production observability

Skills

* Proficient in Python (optionally Go or Rust); strong software design and debugging skills

* Experience with orchestration and serving tools

* Deep familiarity with containerization, Kubernetes, and cloud-native deployment (EKS, GKE, etc.)

* Hands-on with observability stacks (Prometheus, Grafana, etc.)

* Understanding of inference-level optimizations: batching, quantization, caching, and sharding

* Operational experience with LLMs (OpenAI, Anthropic, open-weight models) in both hosted and self-managed setups

* Experience building and maintaining self-hosted inference stacks using frameworks such as vLLM, HuggingFace Transformers, or DeepSpeed-Inference

* Familiarity with agentic AI systems and tooling (LangGraph, Semantic Kernel, CrewAI)

* Cross-cloud deployment experience (AWS, GCP, Azure) and awareness of compliance/latency trade-offs

* Comfortable managing technical communication with external vendors and adapting to fast-moving dependencies

Key Skills

Ranked by relevance

ai cloud aws gcp vendor coordination containerization fault tolerance kubernetes prometheus deepspeed grafana python rust eks

Related Jobs

3 roles aligned with this opportunity

View all jobs

Staff Software Engineer

2026-05-27

Full-time

Not Applicable

Switzerland

Technology

Engineering

Forward Deployed Engineer, GenAI, Google Cloud

2026-05-20

Full-time

Not Applicable

Australia

Information Services

Project Management

Senior Backend Engineer - Kotlin (all genders)

2026-06-03

Full-time

Not Applicable

Austria

Technology

Engineering

🇦🇪

Country Guide

United Arab Emirates

Tax-friendly regional tech hub

Posted: Aug 11, 2025
Type: Full-time
Level: Mid-Senior
Location: Abu Dhabi Emirate
Company: AppliedAI

Industries

Technology Information Internet

Related Jobs

3 roles aligned with this opportunity

View all jobs

Staff Software Engineer

2026-05-27

Full-time

Not Applicable

Switzerland

Technology

Engineering

Forward Deployed Engineer, GenAI, Google Cloud

2026-05-20

Full-time

Not Applicable

Australia

Information Services

Project Management

Senior Backend Engineer - Kotlin (all genders)

2026-06-03

Full-time

Not Applicable

Austria

Technology

Engineering

AI Infrastructure Engineer- Opus

Key Skills

Related Jobs

Staff Software Engineer

Forward Deployed Engineer, GenAI, Google Cloud

Senior Backend Engineer - Kotlin (all genders)

Related Jobs

Staff Software Engineer

Forward Deployed Engineer, GenAI, Google Cloud

Senior Backend Engineer - Kotlin (all genders)

Cookie Settings