AI Engineer

IBM

Ireland · Full-time · Mid-Senior

Introduction

At IBM, work is more than a job — it’s a calling to build, design, code, and make things better for people around the world. IBM Infrastructure is seeking an experienced AI Engineer to help bring Large Language Models (LLMs) to IBM Z (System z), one of the most secure and reliable enterprise computing platforms in the world.

This role is intended for professionals with 3+ years of experience in AI/ML systems, performance engineering, or accelerator‑based inference who are interested in working close to the hardware and across multiple layers of the technology stack. You will help enable generative AI for mission‑critical workloads used by banks, healthcare providers, and government agencies worldwide,

Who This Role Is For

This Role Is Ideal For Engineers Who

Have delivered or supported production AI or ML systems
Enjoy working across hardware, system software, and applications
Are motivated by solving performance‑ and reliability‑critical problems
Want to help define how enterprise‑scale AI runs on mission‑critical platforms

Your Role And Responsibilities

As an AI Engineer on the IBM Z team, you will contribute directly to the design, integration, and operation of LLM workloads on enterprise infrastructure. This role is suited to engineers who enjoy solving complex system‑level problems and collaborating across hardware and software domains.

LLM Integration and Deployment

Develop and integrate LLM inference workloads on IBM Z using Spyre hardware accelerator cards.
Implement model loading, runtime integration, memory management, and resource allocation strategies optimized for the IBM Z architecture.
Enable both traditional mainframe applications and modern cloud‑native services to access LLM capabilities through well‑defined APIs.

Performance Profiling and Optimization

Profile LLM inference workloads to measure latency, throughput, memory usage, and power efficiency.
Analyze performance data to identify bottlenecks and optimization opportunities across hardware utilization, kernels, memory access patterns, and batching strategies.
Document findings and contribute to performance best practices and internal guidance.

Failure Analysis and Debugging

Diagnose and resolve inference errors, performance regressions, and system‑level issues across firmware, drivers, runtimes, and applications.
Collaborate with hardware engineers, firmware developers, and system architects to identify root causes and implement durable solutions.
Contribute to automated testing and regression detection to improve system reliability.

Observability and Telemetry

Design and implement monitoring and telemetry for production LLM workloads.
Instrument systems and deploy logging to capture model performance, hardware utilization, error rates, and system health.
Create dashboards and alerts to support operational teams with real‑time visibility and historical analysis.

Collaboration and Technical Leadership

Participate in architecture reviews and technical discussions across AI, hardware, firmware, and system software teams.
Produce clear technical documentation and share knowledge across the organization.
Stay current with advances in LLMs, hardware acceleration, and inference optimization, and apply learnings to improve IBM Z AI capabilities.

Preferred Education

Bachelor's Degree

Required Technical And Professional Expertise

Demonstrated Professional Experience in AI/ML engineering, ML systems, platform engineering, or performance‑focused software development.
Strong programming skills in Python and working experience with C/C++.
Solid understanding of machine learning fundamentals, particularly transformer‑based models and inference workflows.
Knowledge of computer architecture, including memory hierarchies, parallel processing, and I/O systems.
Experience working in Linux environments, using command‑line tools and scripting.
Hands‑on experience with profiling, performance analysis, and debugging of complex systems.
Familiarity with monitoring, logging, and observability concepts.
Strong problem‑solving skills and the ability to communicate technical concepts clearly.

Preferred Technical And Professional Experience

Experience with PyTorch, TensorFlow, or Hugging Face Transformers.
Exposure to hardware acceleration technologies such as GPUs or AI accelerators.
Familiarity with model optimization techniques (quantization, pruning, knowledge distillation).
Knowledge of inference frameworks such as ONNX Runtime, TensorRT, or TorchServe.
Experience with observability platforms including Prometheus, Grafana, ELK, or Splunk.
Understanding of distributed tracing (OpenTelemetry, Jaeger).
Working knowledge of Docker, Kubernetes, and CI/CD pipelines.
Exposure to IBM Z, z/OS, or enterprise computing environments (beneficial but not required).
Experience working in environments with high requirements for reliability, security, and performance.

Key Skills

Ranked by relevance

ai machine learning kubernetes prometheus tensorflow grafana pytorch python docker linux cicd elk

Related Jobs

3 roles aligned with this opportunity

View all jobs

Data Scientist-Artificial Intelligence

2026-07-02

Full-time

Mid-Senior

India

IT Services

Engineering

Linux Kernel OS Developer

2026-07-07

Full-time

Mid-Senior

Ireland

IT Services

Engineering

Python Backend Engineer

2026-06-19

Full-time

Mid-Senior

Romania

IT Services

Engineering

🇮🇪

Country Guide

Ireland

English-speaking EU tech hub

Posted: Feb 03, 2026
Type: Full-time
Level: Mid-Senior
Location: Waterford
Company: IBM

Industries

IT Services IT Consulting