Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Bagel Labs is an Artificial Intelligence Research Lab developing novel methods for distributed training of frontier diffusion models on commodity hardware. Our work enables training of state-of-the-art generative models for image, video, and world modelling, without centralized GPU superclusters, reducing training compute capex by up to 50%.
We ignore years of experience and pedigree. If you have high agency — meaning your default assumption is that you can control the outcome of whatever situation you are in — we want to hear from you. Every requirement below is flexible for a candidate with high enough agency and tolerance for ambiguity.
Role Description
You will build and run the systems that make decentralized diffusion training work in practice. Training pipelines, inference serving, GPU orchestration across commodity hardware — you own the engineering end-to-end.
Key Responsibilities
- Build and maintain distributed training pipelines across heterogeneous, commodity GPU hardware.
- Profile and optimize training throughput, memory usage, and fault tolerance. Write custom CUDA/Triton kernels when needed.
- Design and operate inference infrastructure: batching, routing, serving large generative models.
- Ship experiment tracking, CI/CD, and reproducibility tooling for the ML stack.
- Work directly with researchers to turn new algorithms into code that actually runs at scale.
Who You Might Be
- Strong in Python and PyTorch. Can read and write C++/CUDA when performance requires it.
- Experience with distributed training: FSDP, DeepSpeed, Megatron-LM, or custom tensor/pipeline/data parallelism.
- Systems thinker — you reason about networking, memory layouts, and failure modes upfront.
- Comfortable with Linux, Docker/Kubernetes, job schedulers, bare-metal and cloud GPU setups.
- Enough ML fundamentals (transformers, diffusion, optimization) to debug a training run end-to-end and hold your own with researchers.
What We Offer
- Top-of-market compensation.
- A deeply technical culture where bold, frontier ideas are debated, stress-tested, and built.
- Paid travel to top ML conferences.
Key Skills
Ranked by relevanceReady to apply?
Join Bagel Labs and take your career to the next level!
Application takes less than 5 minutes

