Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are Bagel Labs - a distributed machine learning research lab working towards open-source superintelligence.
We ignore years of experience and pedigree. If you have high agency - meaning your default assumption is that you can control the outcome of whatever situation you are in - we want to hear from you. Every requirement below is flexible for a candidate with high enough agency and tolerance for ambiguity.
Role Overview
You will design and optimize a distributed diffusion model training and serving system. Your focus is on building scalable, fault-tolerant infrastructure that can serve open-source diffusion models across multiple nodes and regions, with efficient support for adaptation techniques.
Key Responsibilities
- Design and implement distributed diffusion model inference systems for image, video, and multimodal generation across multiple nodes and regions.
- Architect high-availability clusters for diffusion model serving with automatic failover, load balancing, and dynamic batching for variable-resolution outputs.
- Build monitoring and observability systems for distributed diffusion inference (denoising steps, memory usage, generation latency, CLIP score tracking).
- Integrate with open-source diffusion frameworks (Diffusers, ComfyUI, Invoke AI) and optimize for production-scale serving.
- Implement and optimize cutting-edge techniques: rectified flow models, consistency distillation, and progressive distillation for few-step generation.
- Design distributed systems for ControlNet, IP-Adapter, and multi-modal conditioning at scale.
- Build infrastructure for efficient LoRA/LyCORIS adaptation serving with hot-swapping and memory-efficient merging.
- Optimize VAE decoding pipelines and implement tiled/windowed generation for ultra-high-resolution outputs.
- Document architectural decisions, review code, and publish technical deep-dives on blog.bagel.com.
Who You Might Be
You have a deep understanding of distributed systems and diffusion model architectures. You're excited about the rapid evolution from DDPM to flow matching and consistency models. You enjoy architecting scalable infrastructure that can handle the unique challenges of diffusion models - from variable compute requirements per timestep to efficient caching of intermediate states.
Desired Skills (Flexible)
- At least 5 years of experience with distributed systems and production ML serving.
- Hands-on experience with diffusion model frameworks (Diffusers, ComfyUI, or similar) in production environments.
- Deep understanding of diffusion model architectures (U-Net, DiT, rectified flows, consistency models).
- Experience with distributed GPU orchestration for high-memory workloads.
- Proven record of optimizing generation latency (classifier-free guidance, DDIM/DPM solvers, distillation techniques).
- Experience with attention optimization techniques (Flash Attention, xFormers, memory-efficient attention).
- Strong understanding of adaptation techniques (LoRA, LyCORIS, textual inversion, DreamBooth).
- Expertise in handling variable-resolution generation and dynamic batching strategies.
What We Offer
- Top of the market compensation.
- A deeply technical culture where bold, frontier ideas are debated, stress-tested, and built.
- Full remote flexibility within North American time zones.
- Ownership of work that can set the direction for decentralized AI.
- Paid travel opportunities to the top ML conferences around the world.
Please apply here : https://jobs.bagel.com/jobs/Machine%20Learning%20Engineer
Key Skills
Ranked by relevanceReady to apply?
Join Bagel Labs and take your career to the next level!
Application takes less than 5 minutes