Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Senior HPC Cluster Engineer – Linux Performance at Massive GPU Scale
Location: Remote from anywhere in Europe
Salary: tot 160k + 25% bonus
We operate one of the largest GPU infrastructures in the world — 30,000+ GPUs and 10 InfiniBand fabrics across five global data centers. Our infrastructure doubles in size every year. We’re looking for engineers who love getting deep into Linux systems, pushing hardware and software to their limits, and making the world’s fastest AI and HPC workloads run even faster.
Why this role is exciting
You’ll join a small, senior team that works between the hardware and Linux OS layers, solving performance problems that affect tens of thousands of GPUs. This is hands-on, high-impact engineering where microsecond gains matter and every optimization is felt at global scale.
What you’ll do
- Profile and optimize Linux kernel subsystems (CPU scheduling, memory management, networking stack) for GPU clusters and InfiniBand fabrics
- Troubleshoot and resolve complex performance bottlenecks
- Integrate and validate new GPU hardware (KVM/QEMU, PCIe devices, Kubernetes)
- Improve monitoring, alerting, and automation for large-scale, distributed systems
- Occasionally assist customers in optimizing workloads
We’d love to hear from you if you have
- Solid Linux internals knowledge, ideally with kernel tuning or profiling experience (perf, ftrace, eBPF, sysprof, etc.)
- Experience reading/debugging C or C++ system-level code
- Scripting or development skills in Go, Python, or similar
- A background in low-level, complex environments such as HPC, large-scale clusters, or high-performance networking
Bonus points for
- GPU or HPC cluster experience
- InfiniBand or other high-performance interconnect knowledge
- Virtualization stacks (KVM/QEMU), Slurm, Kubernetes
This is for you if you love solving deep technical challenges, care about performance down to the microsecond, and want to work on infrastructure that pushes the limits of what’s possible.
Key Skills
Ranked by relevanceReady to apply?
Join Doghouse Recruitment and take your career to the next level!
Application takes less than 5 minutes