Infrastructure Specialist (AI Networking)
The Infrastructure Specialist (AI Networking) is responsible for designing, implementing, operating, and securing high-performance network infrastructure that supports the Client’s AI/ML, cloud, and digital operations workloads. The role works closely with the Client’s Digital Operations team and AI infrastructure teams to deliver low-latency, high-bandwidth network fabrics for GPU clusters, hybrid/multi-cloud connectivity, network automation, and Zero Trust security. The role is based on-site at the Client’s premises in Abu Dhabi.
Roles & Responsibilities:
Network Design & Architecture
- Design and implement high-bandwidth, low-latency network topologies to support AI/ML workloads and GPU clusters.
- Architect LAN, WAN, SD-WAN and data center fabrics optimized for distributed AI training and inference workloads.
- Evaluate and deploy spine-leaf architectures with RDMA over Converged Ethernet (RoCE) or InfiniBand for AI compute networks.
AI & Cloud Infrastructure Support
- Support hybrid and multi-cloud network connectivity (AWS, Azure, GCP) for AI pipeline integration.
- Configure and manage network segmentation, VLANs and QoS policies to prioritize AI/ML traffic flows.
- Work with AI infrastructure teams to optimize network throughput for model training, data ingestion and inference serving.
Operations & Monitoring
- Implement AI-driven network monitoring and anomaly detection tools to proactively identify and resolve issues.
- Maintain and troubleshoot firewalls, load balancers, routers and switches in AI production environments.
- Develop automation scripts and workflows using Python, Ansible or Terraform for network provisioning and configuration management.
- Own incident response for network-related outages affecting AI services, ensuring swift resolution and RCA documentation.
Security & Compliance
- Enforce network security policies aligned with cybersecurity and data governance standards.
- Implement Zero Trust Network Architecture (ZTNA) principles across AI and data platform environments.
- Conduct regular network vulnerability assessments and participate in security audits.
Operations, Documentation & Stakeholder Management
- Support day-to-day infrastructure operations, troubleshoot incidents and provide technical support to end-users.
- Maintain accurate infrastructure documentation, system records and configuration baselines.
- Collaborate with senior specialists, Client’s departments and external vendors on infrastructure projects, upgrades and migrations.
- Provide accurate and timely updates to the line manager and prepare periodic reports as per Client’s reporting standards.
- Adhere to Client’s governance, HSE policies, code of conduct and applicable compliance requirements.
- Stay informed of industry trends and emerging AI networking technologies to continuously enhance skills and service quality.
Qualification and Education Requirements
- Bachelor’s degree in information technology, Computer Science, Digital Operations or a related field.
- Certifications (Preferred): CCIE or equivalent (Cisco, Juniper, Arista); Cloud certifications (AWS Advanced Networking Specialty, Azure Network Engineer Associate); NVIDIA DGX Networking or AI infrastructure-specific credentials are a plus.
Minimum Experience / Skills Required
- 10+ years of solid experience in networking and security or a closely related field.
- Minimum 2 years supporting AI / GPU infrastructure projects.
- Routing & Switching: Cisco and Juniper.
- AI Networking: RDMA, RoCE, InfiniBand, NVIDIA Networking (preferred).
- Cloud Networking: AWS VPC, Azure VNet, GCP VPC, Direct Connect / ExpressRoute.
- Security: Cisco, Palo Alto, Aruba ClearPass and similar platforms.
- Automation: Python, Ansible, Terraform, Netmiko, NAPALM.
- Monitoring: NetFlow, SNMP, Grafana, Prometheus and AI-based monitoring platforms.
- SD-WAN and Software-Defined Networking (SDN) platforms.
- Telephony: MS Teams Direct Routing.
- Network load balancing and application security: F5.
- Wireless Network: Aruba.
- Strong analytical and problem-solving skills, with the ability to work in a fast-paced AI-driven environment.
- Excellent communication and stakeholder management skills.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Staff Software Engineer
2026-05-27
Backend Engineer
2026-05-27
Data Center Engineer
2026-04-15
- Posted
- May 09, 2026
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Abu Dhabi
- Company
- Omnix International
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Staff Software Engineer
2026-05-27
Backend Engineer
2026-05-27
Data Center Engineer
2026-04-15