Software Engineer (C++, Python)

NVIDIA

United States · Full-time · Associate

About The Company

NVIDIA is a global leader in the technology industry, renowned for its innovative graphics processing units (GPUs) and advanced computing solutions. As a pioneer in AI, deep learning, and high-performance computing, NVIDIA continuously pushes the boundaries of technology to enable the future of digital experiences. The company's commitment to research and development has established it as a key player in various sectors including gaming, professional visualization, data centers, and autonomous vehicles. NVIDIA's culture emphasizes innovation, collaboration, and diversity, fostering an environment where talented professionals can thrive and contribute to groundbreaking projects that shape the future of technology.

About The Role

NVIDIA Dynamo is a high-throughput, low-latency inference framework designed for serving generative AI and reasoning models across multi-node distributed environments. Built with performance in mind using Rust, and designed for extensibility with Python, Dynamo orchestrates GPU shards, manages shared KV cache, and routes requests efficiently across heterogeneous clusters. As large language models (LLMs) continue to grow beyond the memory and compute capabilities of individual GPUs, this platform facilitates the scalable, resilient deployment of cutting-edge LLM workloads. We are seeking a Principal Systems Engineer to lead the vision and development of memory management strategies for large-scale LLM and storage systems, ensuring optimal performance, scalability, and integration across diverse hardware and software components.

Qualifications

Masters or PhD in Computer Science, Electrical Engineering, or a related field, or equivalent experience
15+ years of experience in building large-scale distributed systems, high-performance storage, or ML systems infrastructure
Proficiency in C/C++ and Python with a proven track record of delivering production-grade services
Deep understanding of memory hierarchies including GPU HBM, host DRAM, SSD, and remote/object storage
Experience designing systems that span multiple tiers for performance and cost efficiency
Hands-on experience with distributed caching or key-value systems optimized for low latency and high concurrency
Strong skills in networked I/O, RDMA, NVMe-oF, NVLink, and related technologies
Expertise in profiling and system optimization across CPU, GPU, memory, and network layers
Excellent communication skills and experience leading cross-functional teams and initiatives

Responsibilities

Design and evolve a unified memory layer that integrates GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote storage to support large-scale LLM inference
Architect and implement deep integrations with leading LLM serving engines such as vLLM, SGLang, and TensorRT-LLM, focusing on KV-cache offload, reuse, and remote sharing
Co-design interfaces and protocols enabling disaggregated prefill, peer-to-peer KV-cache sharing, and multi-tier KV-cache storage for high-throughput, low-latency inference
Partner with GPU architecture, networking, and platform teams to leverage technologies like GPUDirect, RDMA, NVLink for low-latency cache access and sharing
Mentor senior and junior engineers, set technical direction for memory and storage subsystems, and represent the team in internal and external forums
Conduct performance profiling, system tuning, and validation to ensure optimal throughput and latency in distributed environments

Benefits

Competitive salary package aligned with experience and location
Equity options and comprehensive health benefits
Opportunities for professional growth and development in a pioneering technology environment
Access to cutting-edge tools and resources for research and innovation
Inclusive and diverse workplace culture that values creativity and collaboration

Equal Opportunity

NVIDIA is committed to fostering a diverse and inclusive work environment. We are proud to be an equal opportunity employer and do not discriminate based on race, religion, color, national origin, gender, gender identity or expression, sexual orientation, age, marital status, veteran status, disability, or any other characteristic protected by law.

Key Skills

Ranked by relevance

storage python ai deep learning rust

Related Jobs

3 roles aligned with this opportunity

View all jobs

Data Analyst with Python

2026-05-20

Full-time

Mid-Senior

United States

Technology

Information Technology

DevOps Engineer

2026-05-27

Full-time

Associate

Argentina

Software Development

Engineering

Embedded Systems & FPGA Engineer (all genders)

2026-05-28

Full-time

Not Applicable

Austria

Technology

Engineering

🇺🇸

Country Guide

United States

World’s deepest and highest-paying tech market

Posted: Jan 15, 2026
Type: Full-time
Level: Associate
Location: United States
Company: NVIDIA

Industries

Technology Information Internet

Related Jobs

3 roles aligned with this opportunity

View all jobs

Data Analyst with Python

2026-05-20

Full-time

Mid-Senior

United States

Technology

Information Technology

DevOps Engineer

2026-05-27

Full-time

Associate

Argentina

Software Development

Engineering

Embedded Systems & FPGA Engineer (all genders)

2026-05-28

Full-time

Not Applicable

Austria

Technology

Engineering

Software Engineer (C++, Python)

Key Skills

Related Jobs

Data Analyst with Python

DevOps Engineer

Embedded Systems & FPGA Engineer (all genders)

Related Jobs

Data Analyst with Python

DevOps Engineer

Embedded Systems & FPGA Engineer (all genders)

Cookie Settings