Scouto AI
Machine Learning Engineer
Scouto AIUnited States13 days ago
Full-timeEngineering, Information Technology
We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.

We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco.

Responsibilities

  • Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models
  • Deploy and maintain large language models at scale in production environments
  • Deploy new models as they are released by frontier labs
  • Implement techniques like quantization, speculative decoding, and KV cache reuse
  • Contribute regularly to open source projects such as SGLang and vLLM
  • Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues
  • Collaborate with the engineering team to bring new features and capabilities to our inference platform
  • Develop robust and scalable infrastructure for AI model serving
  • Create and maintain technical documentation for inference systems

Requirements

  • 3+ years of experience writing high-performance, production-quality code
  • Strong proficiency with Python and deep learning frameworks, particularly PyTorch
  • Demonstrated experience with LLM inference optimization techniques
  • Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred
  • Familiarity with Docker and Kubernetes for containerized deployments
  • Experience with CUDA programming and GPU optimization
  • Strong understanding of distributed systems and scalability challenges
  • Proven track record of optimizing AI models for production environments

Nice to Have

  • Familiarity with TensorRT and TensorRT-LLM
  • Knowledge of vision models and multimodal AI systems
  • Experience implementing techniques like quantization and speculative decoding
  • Contributions to open source machine learning projects
  • Experience with large-scale distributed computing

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including:

  • Full healthcare coverage
  • Quarterly offsites
  • Flexible PTO

Skills: pytorch,gpu optimization,deep learning frameworks,sglang,vllm,cuda programming,machine learning,python,llm

Key Skills

Ranked by relevance