-
View all jobs
We are seeking senior engineers with a passion for performance analysis and optimization to join our team in advancing ground breaking technologies for deep learning compilers and automated kernel generation. At NVIDIA, you will collaborate across the full hardware/software stack—from GPU architecture to deep learning frameworks—to push the boundaries of AI performance. This role provides an outstanding opportunity to craft both hardware and software roadmaps at a company that is at the forefront of the AI revolution. You will work alongside world-class engineers to implement innovative deep learning models and optimize end-to-end performance for NVIDIA’s DL software and hardware ecosystem. You'll have the chance to work on powerful, enterprise-grade GPU clusters delivering hundreds of PetaFLOPS, and gain access to unreleased hardware that is shaping the future of AI.
What You’ll Be Doing
#deeplearning
JR1990202
What You’ll Be Doing
- Profile, analyze, and optimize the performance of deep learning models and workloads on ground breaking hardware and software platforms.
- Develop tooling for profiling and microbenchmarking of DL workloads running compiled models uncovering optimization opportunities.
- Collaborate with teams across NVIDIA to provide performance insights and recommendations that improve the design and efficiency of DL frameworks and workloads.
- Own the development and implementation of standard methodologies for compiling, testing, and deploying high-performance deep learning models.
- Conduct performance benchmarking on enterprise-grade GPU clusters and pre-release hardware, driving improvements to NVIDIA’s DL software stack and hardware roadmap.
- 5+ years of experience in deep learning model implementation, software development, and performance optimization.
- BSc, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Mathematics, Physics, or a related technical field, or equivalent practical experience.
- Proficiency in Python , with extensive hands-on experience using at least one major deep learning framework (e.g., PyTorch, TensorFlow, JAX).
- Strong problem-solving and analytical skills , with a proven track record in debugging, performance tuning, and workload optimization.
- Experience with deep learning compilers (e.g., PyTorch’s torch.compile, XLA, or other similar technologies)
- Experience with running large-scale workloads in HPC clusters
- Knowledge and passion for DevOps/MLOps practices for Deep Learning-based product’s development.
- Solid understanding of Linux environments and containerization technologies such as Docker
- Familiarity with GPU programming or parallel computing.
#deeplearning
JR1990202
Key Skills
Ranked by relevance
c
ha
deep learning
ai
pytorch
unity
nat
pan
parallel computing
containerization
tensorflow
python
docker
devops
linux
mlops
wan
eop
ui
ux
kf
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Senior Embedded Machine Learning Engineer (C++)
2026-05-28
Full-time
Mid-Senior
Finland
Software Development
Information Technology
View Job Details
Related
Full Stack Engineer, Blockchain
2026-05-19
Full-time
Not Applicable
Poland
Software Development
Engineering
View Job Details
Related
Back End Developer
2026-05-28
Full-time
Mid-Senior
Germany
Data Infrastructure
Analyst
Login to Apply
- Posted
- Nov 29, 2024
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Poland
- Company
- NVIDIA
Industries
Computer Hardware Manufacturing
Software Development
Computers
Electronics Manufacturing
Categories
Engineering
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Senior Embedded Machine Learning Engineer (C++)
2026-05-28
Full-time
Mid-Senior
Finland
Software Development
Information Technology
View Job Details
Related
Full Stack Engineer, Blockchain
2026-05-19
Full-time
Not Applicable
Poland
Software Development
Engineering
View Job Details
Related
Back End Developer
2026-05-28
Full-time
Mid-Senior
Germany
Data Infrastructure
Analyst