-
Tensorix

Senior ML Systems Engineer (Inference)

Tensorix
Ireland · Full-time · Mid-Senior

Overview

Tensorix is a sovereign AI infrastructure platform headquartered in Dublin. We deploy and operate open-source large language models on EU-sovereign infrastructure across Europe, providing private, zero-retention inference for regulated industries including finance, healthcare and government. Our platform offers drop-in OpenAI-compatible APIs, enabling developers and enterprises alike to adopt AI without compromising on data privacy, compliance or performance.


We are looking for a Senior ML Systems Engineer (Inference) to join our growing engineering team. Reporting to the CTO, you will act as the technical owner of the model serving layer at the heart of our platform, from selecting and evaluating new open-source models to deploying, tuning and operating them in production on our on-prem GPU fleet. This is a deeply hands-on role in a fast-moving scaleup where your work will directly shape the performance, cost and reliability of every token we serve.


You will work primarily with modern inference frameworks such as vLLM, SGLang and TensorRT-LLM, running on NVIDIA hardware across our on-prem estate, with supporting workloads on AWS. You will benchmark frontier open-weight models as they release, quantify performance and cost trade-offs, and lead the technical side of our GPU procurement and capacity planning. We are an AI-native team - tools such as Claude Code and Codex are part of our daily workflow and materially accelerate how we build and operate systems. We value engineers who combine deep systems intuition with a pragmatic, research-aware mindset.


This is a high-impact senior individual contributor role spanning model serving, performance engineering and GPU infrastructure strategy.


Responsibilities

  • Model Deployment & Serving- Deploy and operate open-source large language models in production using vLLM, SGLang, TensorRT-LLM and other high-performance serving frameworks. Own the full lifecycle from model selection through to production rollout.
  • Performance Optimisation- Profile and tune inference workloads for latency, throughput, memory efficiency and GPU utilisation. Work across quantisation, batching strategies, KV cache management, tensor and pipeline parallelism and attention kernel selection.
  • Model Evaluation- Benchmark new open-weight models as they release, running performance, quality and cost evaluations to inform which models we productise. Maintain internal benchmarking tooling and an evidence-based view of the model landscape.
  • Hardware Planning & Procurement- Lead capacity planning, GPU procurement and future hardware roadmap decisions. Translate workload requirements into concrete hardware specifications, including GPU, interconnect, networking and storage.
  • Infrastructure & Operations- Build and maintain the infrastructure that runs our model fleet, spanning containerised GPU workloads, orchestration, observability and autoscaling. Operate across on-prem GPU clusters and AWS where appropriate.
  • Reliability & Observability- Instrument the serving stack with meaningful metrics covering tokens per second, time-to-first-token, tail latency, GPU utilisation and cost per token. Drive incident response and post-incident improvements.
  • Research & Experimentation- Track developments in inference optimisation, serving architectures and model efficiency. Prototype new techniques such as speculative decoding, prefix caching, disaggregated prefill/decode and emerging quantisation methods.
  • Collaboration & Knowledge Sharing- Partner closely with platform, product and customer-facing teams. Participate in architectural decisions, share findings and help raise the collective bar across the engineering team.


Skills & Experience

  • 5+ years of professional experience (or equivalent depth of expertise) in ML infrastructure, systems engineering or a closely related discipline, with a meaningful portion focused on production ML workloads
  • Hands-on experience deploying and tuning large language models with modern inference frameworks such as vLLM, SGLang, TensorRT-LLM and similar high-performance inference systems
  • Strong working knowledge of GPU architecture, CUDA fundamentals and the performance characteristics of modern NVIDIA hardware (H100, H200 and B300-class hardware)
  • Practical experience with inference optimisation techniques including quantisation (AWQ, GPTQ, FP8), continuous batching, KV cache strategies and tensor/pipeline parallelism
  • Proficiency in Python and comfort reading and contributing to systems-level code in the broader inference ecosystem
  • Solid experience with Linux, containerisation and orchestration of GPU workloads
  • Familiarity with benchmarking methodology and the ability to design experiments that produce defensible, reproducible results
  • Comfortable using AI-assisted development tools (e.g. Claude Code, Codex) as part of your daily workflow
  • A clear and concise communicator who thrives in ambiguity and can articulate technical decisions to both technical and non-technical audiences


Nice to Have

  • Experience with Kubernetes and GPU scheduling in multi-tenant environments
  • Exposure to distributed training or fine-tuning workflows, even if your primary focus is inference
  • Experience with AWS infrastructure and related services (e.g. EC2, ECS, EKS, S3)
  • Familiarity with alternative accelerators (AMD Instinct, Intel Gaudi) or emerging inference hardware
  • Contributions to open-source inference projects such as vLLM, SGLang or related tooling
  • Exposure to Golang or Rust for systems-level work


Education & Qualifications

  • BSc/MSc in Computer Science, Software Engineering, Electrical Engineering OR a related technical discipline OR equivalent practical experience


Remuneration

  • Highly competitive package, dependent on experience
  • 25 days paid annual leave
  • Hybrid working from our centrally located Dublin office, with remote flexibility
  • Free inference tokens!


******* NO AGENCY ASSISTANCE REQUIRED *******

Key Skills

Ranked by relevance

ai aws incident response kubernetes python golang linux rust eks ecs
Login to Apply
Posted
May 11, 2026
Type
Full-time
Level
Mid-Senior
Location
Greater Dublin
Company
Tensorix

Industries

Technology Information Internet

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Tensorix
Related

Senior ML Systems Engineer (Inference)

2026-05-18

Full-time
Mid-Senior
Ireland
Technology
Information Technology
View Job Details
EPAM Systems
Related

DevOps Engineer

2026-05-27

Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
Journi
Related

Senior Backend Engineer - Kotlin (all genders)

2026-06-03

Full-time
Not Applicable
Austria
Technology
Engineering