-
kaiko.ai

Machine Learning Engineer – GPU Acceleration & Distributed Training

kaiko.ai
Netherlands · Full-time · Entry

What we build at Kaiko

The core of Kaiko’s vision constitutes of Kaiko’s data- & compute platform (KDCP). A distributed information system that brings together hospitals and labs to provide data ingestion and processing, analysis, and modeling, reporting and intelligence, distribution and sharing of a multitude of complex sources of structured and unstructured data, including genomics, imaging, and clinical data, delivered as a multi-tenant SaaS platform on the cloud.

We are working in close collaboration with cancer hospitals and research institutes to integrate KCDP at those premises. Our vision is to explore new frontiers on how to support medical doctors in their decision-making process, as well as enable researchers to run complex machine learning pipelines and resulting models.


About the Role

As a Machine Learning Engineer specializing in GPU acceleration and distributed training, you will focus on enhancing the efficiency of handling very long sequence lengths in Transformers, State Space Models (SSM) and other architectures using CUDA/Triton & Torch. Additionally, you will scale training processes across multi-node distributed systems to ensure robust and efficient model development. You will work closely with our ML Research teams to build and maintain high-performance training pipelines. 


How you’ll contribute

  • Efficiency Optimization: Leverage CUDA, Triton and Torch to improve the efficiency of Transformers, SSMs and other architectures for very long sequence lengths.
  • Distributed Training: Scale custom machine learning training pipelines efficiently across multi-node GPU clusters.
  • Collaboration: Work with ML Researchers and Engineering teams to integrate optimized training solutions into the development lifecycle.


What you'll bring

  •  Master's degree in computer science, Engineering, or a related field. Ph.D. is a plus.
  •  Proficient in Python with extensive experience in PyTorch.
  • Deep expertise with CUDA and/or Triton for optimizing GPU performance, specifically for large-scale sequence processing.
  •  Proven experience in scaling machine learning trainings to multi-node distributed GPU environments.
  •  Strong understanding of Transformer, State Space Models (SSMs) and other common architectures and their optimization.
  • Skilled in performance tuning and profiling for both software and hardware in machine learning contexts.
  •  Ability to diagnose and resolve complex technical challenges related to GPU acceleration and distributed training.
  •  Excellent communication skills and ability to work effectively within a multidisciplinary team.
  • Capable of managing multiple projects simultaneously and adapting to evolving priorities in a fast-paced environment.


Nice to Have

  • Experience with containerization technologies, such as Docker or Kubernetes.
  • Experience with cloud computing platforms, such as Azure, AWS or GCP.


Additional Information

This position is full-time and requires residency in either the Netherlands or Switzerland, a valid work permit, and proximity to our offices in Amsterdam or Zürich. A Certificate of Conduct will be necessary upon finalizing the employment contract due to the handling of sensitive data.

 

Our culture

At Kaiko we strive for an open, creative and non-hierarchical work atmosphere where we offer flexibility – for instance remote work - and direct impact in return for accountability and team spirit. Prioritizing, managing, and executing your own goals with ownership and alignment with those of the company. Sensitive data builds the core of our daily work and thus data privacy aspects are key skills of all our employees.

We give talented people a lot of room to explore new ideas and we reward exceptional talent with an attractive package and opportunities for personal development.  

Key Skills

Ranked by relevance

c ai ha machine learning ui spi cloud lan containerization kubernetes pytorch python docker excel saas aws gcp eop cis pan
Login to Apply
Posted
Nov 11, 2024
Type
Full-time
Level
Entry
Location
Amsterdam Area
Company
kaiko.ai

Industries

Technology Information Internet

Categories

Engineering Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
kaiko.ai
Related

Segmentation & Machine Learning Internship

2025-11-21

Internship
Internship
Netherlands
Technology
Engineering
View Job Details
kaiko.ai
Related

Machine Learning Intern

2025-10-04

Internship
Internship
Netherlands
Technology
Engineering
View Job Details
kaiko.ai
Related

ML platform engineer

2025-09-23

Full-time
Entry
Switzerland
Technology
Engineering