Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
At IBM, we’re reimagining how data centers think, reason, and optimize themselves for the age of artificial intelligence. Our Network Intelligence (INI) product brings AI-driven reasoning and automation to enterprise and telecom operations—combining time-series foundation models, agentic frameworks, and domain-aware knowledge graphs. Building on this foundation, IBM’s AI Data Centre Networking initiative applies the same intelligence to network infrastructure, enabling high-performance, self-optimizing fabrics for GPU/TPU clusters and distributed AI workloads. The result is a new class of AI-native data centers—resilient, adaptive, and designed to power the world’s most demanding AI systems.
We’re looking for an AI Data Centre Networking Subject Matter Expert (SME) to help shape and guide this transformation. You will collaborate with IBM’s engineering, research, and product teams to identify, validate, and architect solutions for next-generation networking use cases — including AI inference traffic optimization, priority-based flow control for GPU/TPU clusters, multi-tenant isolation, and fabric scalability challenges. Your insights will directly influence how IBM builds and evolves intelligent infrastructure for AI at scale.
Your Role And Responsibilities
Position Summary
The ideal candidate brings hands-on experience operating AI/ML infrastructure and understands the networking challenges of high-performance computing workloads. While deep expertise across all domains is valuable, we welcome emerging practitioners with strong foundational knowledge and genuine curiosity about solving real-world problems. You will serve as the bridge between real-world data centre operations and our product roadmap, ensuring we build solutions that address genuine infrastructure challenges faced by organizations running distributed AI workloads at scale.
You will help define next-generation networking architectures that enable efficient, scalable, and resilient AI compute fabrics across distributed environments.
Key Responsibilities
- Product Advisory & Strategy: Act as the technical voice of the customer, translating operational pain points from AI data centre environments into actionable product requirements.
- Cross-Functional Collaboration: Work with engineering and product teams to validate use cases, refine features, and align solutions with real-world AI infrastructure needs.
- Technical Validation: Evaluate networking architectures for distributed AI workloads including training, inference, and GPU/TPU communication.
- Use Case Development: Define reference architectures for high-bandwidth interconnects, congestion management, workload isolation, and multi-tenant segmentation.
- Technology Assessment: Track emerging AI networking technologies, standards, and trends — focusing on performance and energy efficiency.
- Requirements Translation: Convert operational insights into clear, implementable technical specifications.
- Customer & Partner Engagement: Support customer discussions, PoC validations, and feedback loops.
- Benchmarking & Market Insight: Analyze competitor and hyperscaler AI DC architectures to guide product differentiation.
- Knowledge Sharing: Contribute to technical documentation and internal knowledge bases.
Master's Degree
Required Technical And Professional Expertise
- 10+ years’ experience in data centre networking, HPC systems, or AI infrastructure operations.
- Understanding of AI/ML workload behavior including distributed training, inference, and data pipelines.
- Familiarity with RDMA (RoCE/InfiniBand), VXLAN, EVPN, and Data Center Bridging (DCB).
- Experience with GPU clusters or NVIDIA DGX-class AI infrastructure preferred.
- Ability to articulate technical concepts clearly to technical and non-technical audiences.
- Demonstrated capability to influence product design and translate operational experience into actionable insights.
- Experience with Kubernetes, Kubeflow, or MLOps environments for AI workloads.
- Knowledge of telemetry, observability, and automation tools in DC environments.
- Experience in product development or solutions architecture roles.
- Familiarity with SDN, composable infrastructure, or disaggregated data centre architectures.
- Exposure to AI fabric orchestration and high-performance interconnect management.
- Contributions to AI/Networking technical communities or open-source initiatives
Key Skills
Ranked by relevanceReady to apply?
Join IBM and take your career to the next level!
Application takes less than 5 minutes

