-
IBM

AI Engineer

IBM
Ireland · Full-time · Mid-Senior

Introduction

At IBM, work is more than a job — it’s a calling to build, design, code, and make things better for people around the world. IBM Infrastructure is seeking an experienced AI Engineer to help bring Large Language Models (LLMs) to IBM Z (System z), one of the most secure and reliable enterprise computing platforms in the world.

This role is intended for professionals with 3+ years of experience in AI/ML systems, performance engineering, or accelerator‑based inference who are interested in working close to the hardware and across multiple layers of the technology stack. You will help enable generative AI for mission‑critical workloads used by banks, healthcare providers, and government agencies worldwide,

Who This Role Is For

This Role Is Ideal For Engineers Who

  • Have delivered or supported production AI or ML systems
  • Enjoy working across hardware, system software, and applications
  • Are motivated by solving performance‑ and reliability‑critical problems
  • Want to help define how enterprise‑scale AI runs on mission‑critical platforms

Your Role And Responsibilities

As an AI Engineer on the IBM Z team, you will contribute directly to the design, integration, and operation of LLM workloads on enterprise infrastructure. This role is suited to engineers who enjoy solving complex system‑level problems and collaborating across hardware and software domains.

LLM Integration and Deployment

  • Develop and integrate LLM inference workloads on IBM Z using Spyre hardware accelerator cards.
  • Implement model loading, runtime integration, memory management, and resource allocation strategies optimized for the IBM Z architecture.
  • Enable both traditional mainframe applications and modern cloud‑native services to access LLM capabilities through well‑defined APIs.

Performance Profiling and Optimization

  • Profile LLM inference workloads to measure latency, throughput, memory usage, and power efficiency.
  • Analyze performance data to identify bottlenecks and optimization opportunities across hardware utilization, kernels, memory access patterns, and batching strategies.
  • Document findings and contribute to performance best practices and internal guidance.

Failure Analysis and Debugging

  • Diagnose and resolve inference errors, performance regressions, and system‑level issues across firmware, drivers, runtimes, and applications.
  • Collaborate with hardware engineers, firmware developers, and system architects to identify root causes and implement durable solutions.
  • Contribute to automated testing and regression detection to improve system reliability.

Observability and Telemetry

  • Design and implement monitoring and telemetry for production LLM workloads.
  • Instrument systems and deploy logging to capture model performance, hardware utilization, error rates, and system health.
  • Create dashboards and alerts to support operational teams with real‑time visibility and historical analysis.

Collaboration and Technical Leadership

  • Participate in architecture reviews and technical discussions across AI, hardware, firmware, and system software teams.
  • Produce clear technical documentation and share knowledge across the organization.
  • Stay current with advances in LLMs, hardware acceleration, and inference optimization, and apply learnings to improve IBM Z AI capabilities.

Preferred Education

Bachelor's Degree

Required Technical And Professional Expertise

  • Demonstrated Professional Experience in AI/ML engineering, ML systems, platform engineering, or performance‑focused software development.
  • Strong programming skills in Python and working experience with C/C++.
  • Solid understanding of machine learning fundamentals, particularly transformer‑based models and inference workflows.
  • Knowledge of computer architecture, including memory hierarchies, parallel processing, and I/O systems.
  • Experience working in Linux environments, using command‑line tools and scripting.
  • Hands‑on experience with profiling, performance analysis, and debugging of complex systems.
  • Familiarity with monitoring, logging, and observability concepts.
  • Strong problem‑solving skills and the ability to communicate technical concepts clearly.

Preferred Technical And Professional Experience

  • Experience with PyTorch, TensorFlow, or Hugging Face Transformers.
  • Exposure to hardware acceleration technologies such as GPUs or AI accelerators.
  • Familiarity with model optimization techniques (quantization, pruning, knowledge distillation).
  • Knowledge of inference frameworks such as ONNX Runtime, TensorRT, or TorchServe.
  • Experience with observability platforms including Prometheus, Grafana, ELK, or Splunk.
  • Understanding of distributed tracing (OpenTelemetry, Jaeger).
  • Working knowledge of Docker, Kubernetes, and CI/CD pipelines.
  • Exposure to IBM Z, z/OS, or enterprise computing environments (beneficial but not required).
  • Experience working in environments with high requirements for reliability, security, and performance.

Key Skills

Ranked by relevance

ai machine learning kubernetes prometheus tensorflow grafana pytorch python docker linux cicd elk
Login to Apply
Posted
Feb 03, 2026
Type
Full-time
Level
Mid-Senior
Location
Waterford
Company
IBM

Industries

IT Services IT Consulting

Categories

Engineering Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
IBM
Related

Data Scientist (Machine Learning & NLP)

2026-04-08

Full-time
Mid-Senior
Romania
IT Services
Engineering
View Job Details
IBM
Related

Data Scientist (Machine Learning & NLP)

2026-04-08

Full-time
Mid-Senior
Romania
IT Services
Engineering
View Job Details
IBM
Related

Data Scientist (Machine Learning & NLP)

2026-04-07

Full-time
Mid-Senior
Romania
IT Services
Engineering