Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
An internationally recognized technology research organization is expanding its advanced AI systems team in Zurich and is looking for a Senior Researcher in LLM Systems & Inference Architecture.
Based in one of Europe’s most dynamic technology hubs, the Zurich research centre brings together world-class scientists and engineers working on next-generation AI infrastructure. The team focuses on building the underlying systems that enable large-scale AI models to run faster, more efficiently, and at global scale across modern heterogeneous hardware platforms.
This is a rare opportunity to work at the cutting edge of LLM infrastructure, contributing to innovations that directly influence the future of generative AI systems.
Why This Role Is Interesting
Large language models are transforming how software is built and how people interact with technology. However, scaling them efficiently remains one of the most challenging problems in modern computing.
In this role, you will tackle those challenges head-on — working on inference efficiency, system architecture, and hardware–software co-design to enable faster, more scalable AI systems. You’ll collaborate with leading researchers and engineers, contribute to open research, and help shape the infrastructure powering the next generation of AI.
The Zurich research environment offers a strong academic connection, access to cutting-edge hardware platforms, and the freedom to pursue impactful research while building production-grade systems.
The Role
LLM Inference Engine Development
- Design and implement high-performance inference engines for large language models.
- Optimize existing open-source and internal inference frameworks.
- Develop advanced model optimization techniques including:
- Quantization
- Sparse attention
- Key-value cache reuse and memory optimization
- Improve throughput and latency for transformer and generative AI workloads.
Hardware–Software Co-Design
- Develop optimized compute kernels targeting heterogeneous accelerator platforms.
- Enable efficient execution of LLM workloads across specialized AI hardware.
- Profile end-to-end AI pipelines to identify performance bottlenecks across frameworks, runtime layers, and hardware.
- Bridge Python-based ML frameworks with high-performance accelerator backends.
Performance Engineering & System Optimization
- Analyse and improve large-scale AI inference pipelines.
- Optimize system performance with focus on:
- Memory bandwidth utilization
- Compute efficiency
- Execution scheduling
- Improve end-to-end system performance across distributed and heterogeneous computing environments.
Research & Ecosystem Contribution
- Published research in leading machine learning and systems conferences (e.g., ISCA, ASPLOS, MLSys).
- Contribute to open-source AI infrastructure and tooling ecosystems.
- Support adoption of advanced AI infrastructure through developer tooling, documentation, and collaboration with external research partners.
What We’re Looking For
Required
- PhD or Master’s degree in Computer Science, Computer Engineering, or a related discipline.
- Strong programming skills in Python and C/C++ for system-level development.
- Hands-on experience with machine learning frameworks such as PyTorch or TensorFlow.
- Deep understanding of optimization techniques for large-scale neural networks.
- Experience profiling and optimizing system performance in large AI workloads.
Preferred
- Experience with heterogeneous computing or accelerator programming (GPU, NPU, or other xPU architectures).
- Background in kernel development, compiler optimizations, or custom operator implementation.
- Experience working with multimodal or vision-language models.
- Publications in top-tier ML or systems conferences or contributions to open-source AI infrastructure.
Personal Profile
- Strong systems thinker who can bridge AI models with underlying hardware platforms.
- Research-driven mindset combined with strong engineering execution.
- Ability to work independently while collaborating with multidisciplinary teams.
- Clear and effective technical communication skills.
If you are passionate about building the next generation of high-performance AI systems and want to work in a world-class research environment in Zurich, we would love to hear from you.
Qualified candidates are encouraged to apply now or email [email protected].
By applying to this role you understand that we may collect your personal data and store and process it on our systems. For more information please see our Privacy Notice (https://eu-recruit.com/about-us/privacy-notice/)
Key Skills
Ranked by relevanceReady to apply?
Join European Tech Recruit and take your career to the next level!
Application takes less than 5 minutes

