-
AppliedAI

AI Infrastructure Engineer- Opus

AppliedAI
United Arab Emirates · Full-time · Mid-Senior

As an Opus AI Infrastructure Engineer, you will lead the optimization and scaling of AI pipelines that serve foundational models in live production environments. You will focus on evolving real-time and batch inference systems for reliability, low latency, and seamless integration with product logic. This senior engineering role operates at the core of AI delivery, requiring strong system design, infrastructure fluency, and a deep commitment to performance and operational excellence.


You will work across modern cloud environments and manage a diverse and evolving portfolio of LLMs, both proprietary and open-source. You will play a key role in evaluating model trade-offs, adapting to rapid model iteration, and ensuring smooth transitions as providers update APIs, capabilities, and service tiers. You will also coordinate directly with foundational model vendors to align roadmap requirements, performance issues, and deployment optimizations.


Key Responsibilities


AI Serving Pipeline Optimization

* Design, rewrite, and mature inference pipelines for real-time, streaming, and batch workloads

* Optimize throughput, latency, and reliability via architectural evolution and model-specific strategies

* Manage orchestration of heterogeneous LLMs with varying performance, cost, and response profiles

* Implement fallback logic, request routing, and intelligent retry systems for availability and graceful degradation

* Build tooling for profiling and benchmarking pipelines involving LLMs and agentic orchestration frameworks

* Adapt infrastructure and integrations to support rapidly changing LLM APIs, model versions, and provider behavior

* Design and deploy self-hosted LLM inference pipelines, including model loading, quantization, batching, and runtime optimization on GPU/TPU environments


Production Infrastructure & Runtime Efficiency

* Own the live AI execution layer: coordinate model calls, resource scheduling, and latency-critical paths

* Monitor and improve key metrics: latency, token throughput, error rates, and autoscaling responsiveness

* Deploy and scale LLM services across cloud environments (AWS, GCP, Azure, on-prem), optimizing for regional availability and regulatory constraints

* Ensure robust observability, failover, rollback, and health monitoring across all deployed models

* Collaborate with infra teams to maximize compute efficiency across CPU/GPU/TPU backends


Model Vendor Coordination & External Integrations

* Serve as a technical counterpart to foundational model providers, communicating product needs, debugging issues, and tracking performance updates

* Maintain high reliability across provider transitions, including model deprecations, quota shifts, and new capability rollouts

* Evaluate and experiment with emerging models across different providers, providing comparative benchmarks and integration plans


System Integration & Engineering Excellence

* Integrate pipelines cleanly with APIs, orchestration layers, and application logic

* Refactor legacy systems for modularity, observability, and performance

* Promote reusable, maintainable infrastructure via tooling and shared abstractions

* Uphold engineering standards through code reviews, performance audits, and technical mentorship


Qualifications


Education

* Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field


Experience

* 5+ years in backend, ML, or infrastructure engineering with a focus on live AI systems

* Demonstrated experience building and scaling real-time inference infrastructure

* Proven track record in latency optimization, fault tolerance, and production observability


Skills

* Proficient in Python (optionally Go or Rust); strong software design and debugging skills

* Experience with orchestration and serving tools

* Deep familiarity with containerization, Kubernetes, and cloud-native deployment (EKS, GKE, etc.)

* Hands-on with observability stacks (Prometheus, Grafana, etc.)

* Understanding of inference-level optimizations: batching, quantization, caching, and sharding

* Operational experience with LLMs (OpenAI, Anthropic, open-weight models) in both hosted and self-managed setups

* Experience building and maintaining self-hosted inference stacks using frameworks such as vLLM, HuggingFace Transformers, or DeepSpeed-Inference

* Familiarity with agentic AI systems and tooling (LangGraph, Semantic Kernel, CrewAI)

* Cross-cloud deployment experience (AWS, GCP, Azure) and awareness of compliance/latency trade-offs

* Comfortable managing technical communication with external vendors and adapting to fast-moving dependencies

Key Skills

Ranked by relevance

ai cloud aws gcp vendor coordination containerization fault tolerance kubernetes prometheus deepspeed grafana python rust eks
Login to Apply
Posted
Aug 11, 2025
Type
Full-time
Level
Mid-Senior
Location
Abu Dhabi Emirate
Company
AppliedAI

Industries

Technology Information Internet

Categories

Engineering

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Ripple
Related

Staff Software Engineer

2026-05-27

Full-time
Not Applicable
Switzerland
Technology
Engineering
View Job Details
Google
Related

Forward Deployed Engineer, GenAI, Google Cloud

2026-05-20

Full-time
Not Applicable
Australia
Information Services
Project Management
View Job Details
Journi
Related

Senior Backend Engineer - Kotlin (all genders)

2026-06-03

Full-time
Not Applicable
Austria
Technology
Engineering