Overview
Tensorix is a sovereign AI infrastructure platform headquartered in Dublin. We deploy and operate open-source large language models on EU-sovereign infrastructure across Europe, providing private, zero-retention inference for regulated industries including finance, healthcare and government. Our platform offers drop-in OpenAI-compatible APIs, enabling developers and enterprises alike to adopt AI without compromising on data privacy, compliance or performance.
We are looking for a Senior ML Systems Engineer (Inference) to join our growing engineering team. Reporting to the CTO, you will act as the technical owner of the model serving layer at the heart of our platform, from selecting and evaluating new open-source models to deploying, tuning and operating them in production on our on-prem GPU fleet. This is a deeply hands-on role in a fast-moving scaleup where your work will directly shape the performance, cost and reliability of every token we serve.
You will work primarily with modern inference frameworks such as vLLM, SGLang and TensorRT-LLM, running on NVIDIA hardware across our on-prem estate, with supporting workloads on AWS. You will benchmark frontier open-weight models as they release, quantify performance and cost trade-offs, and lead the technical side of our GPU procurement and capacity planning. We are an AI-native team - tools such as Claude Code and Codex are part of our daily workflow and materially accelerate how we build and operate systems. We value engineers who combine deep systems intuition with a pragmatic, research-aware mindset.
This is a high-impact senior individual contributor role spanning model serving, performance engineering and GPU infrastructure strategy.
Responsibilities
- Model Deployment & Serving- Deploy and operate open-source large language models in production using vLLM, SGLang, TensorRT-LLM and other high-performance serving frameworks. Own the full lifecycle from model selection through to production rollout.
- Performance Optimisation- Profile and tune inference workloads for latency, throughput, memory efficiency and GPU utilisation. Work across quantisation, batching strategies, KV cache management, tensor and pipeline parallelism and attention kernel selection.
- Model Evaluation- Benchmark new open-weight models as they release, running performance, quality and cost evaluations to inform which models we productise. Maintain internal benchmarking tooling and an evidence-based view of the model landscape.
- Hardware Planning & Procurement- Lead capacity planning, GPU procurement and future hardware roadmap decisions. Translate workload requirements into concrete hardware specifications, including GPU, interconnect, networking and storage.
- Infrastructure & Operations- Build and maintain the infrastructure that runs our model fleet, spanning containerised GPU workloads, orchestration, observability and autoscaling. Operate across on-prem GPU clusters and AWS where appropriate.
- Reliability & Observability- Instrument the serving stack with meaningful metrics covering tokens per second, time-to-first-token, tail latency, GPU utilisation and cost per token. Drive incident response and post-incident improvements.
- Research & Experimentation- Track developments in inference optimisation, serving architectures and model efficiency. Prototype new techniques such as speculative decoding, prefix caching, disaggregated prefill/decode and emerging quantisation methods.
- Collaboration & Knowledge Sharing- Partner closely with platform, product and customer-facing teams. Participate in architectural decisions, share findings and help raise the collective bar across the engineering team.
Skills & Experience
- 5+ years of professional experience (or equivalent depth of expertise) in ML infrastructure, systems engineering or a closely related discipline, with a meaningful portion focused on production ML workloads
- Hands-on experience deploying and tuning large language models with modern inference frameworks such as vLLM, SGLang, TensorRT-LLM and similar high-performance inference systems
- Strong working knowledge of GPU architecture, CUDA fundamentals and the performance characteristics of modern NVIDIA hardware (H100, H200 and B300-class hardware)
- Practical experience with inference optimisation techniques including quantisation (AWQ, GPTQ, FP8), continuous batching, KV cache strategies and tensor/pipeline parallelism
- Proficiency in Python and comfort reading and contributing to systems-level code in the broader inference ecosystem
- Solid experience with Linux, containerisation and orchestration of GPU workloads
- Familiarity with benchmarking methodology and the ability to design experiments that produce defensible, reproducible results
- Comfortable using AI-assisted development tools (e.g. Claude Code, Codex) as part of your daily workflow
- A clear and concise communicator who thrives in ambiguity and can articulate technical decisions to both technical and non-technical audiences
Nice to Have
- Experience with Kubernetes and GPU scheduling in multi-tenant environments
- Exposure to distributed training or fine-tuning workflows, even if your primary focus is inference
- Experience with AWS infrastructure and related services (e.g. EC2, ECS, EKS, S3)
- Familiarity with alternative accelerators (AMD Instinct, Intel Gaudi) or emerging inference hardware
- Contributions to open-source inference projects such as vLLM, SGLang or related tooling
- Exposure to Golang or Rust for systems-level work
Education & Qualifications
- BSc/MSc in Computer Science, Software Engineering, Electrical Engineering OR a related technical discipline OR equivalent practical experience
Remuneration
- Highly competitive package, dependent on experience
- 25 days paid annual leave
- Hybrid working from our centrally located Dublin office, with remote flexibility
- Free inference tokens!
******* NO AGENCY ASSISTANCE REQUIRED *******
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Senior ML Systems Engineer (Inference)
2026-05-18
DevOps Engineer
2026-05-27
Senior Backend Engineer - Kotlin (all genders)
2026-06-03
- Posted
- May 11, 2026
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Greater Dublin
- Company
- Tensorix
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Senior ML Systems Engineer (Inference)
2026-05-18
DevOps Engineer
2026-05-27
Senior Backend Engineer - Kotlin (all genders)
2026-06-03