AI Performance Engineer
We are looking for a AI Performance Engineer in Latin America to work on latest large AI model knowledge, deep learning performance optimization and benchmarking on modern GPU-based systems, with a strong focus on MLPerf Training and Inference workloads.
The primary models we work on include Llama 2, Llama 3, DeepSeek, and open-source GPT-style models (GPT-OSS).
This is a hands-on engineering role involving performance profiling, PyTorch optimization, large-scale distributed training, and building reproducible benchmarking environments, in close collaboration with other performance- and systems-focused engineers.
What You Will Do
- Optimize training and inference pipelines for large language models such as Llama 2, Llama 3, DeepSeek, and GPT-OSS
- Work on MLPerf Training and/or Inference benchmarks for LLM workloads
- Profile GPU workloads to identify compute, memory, and communication bottlenecks
- Improve scaling efficiency across multi-GPU and multi-node setups
- Tune distributed training strategies (DDP, FSDP, ZeRO, tensor/pipeline parallelism)
- Build and maintain reproducible benchmark environments (Docker / Singularity)
- Collaborate with engineers on performance, stability, and scalability improvements
- Document findings and contribute to benchmark submissions and internal reports
What We Expect (Required)
- 1-2 year of AI engineering knowledge / Deep Learning, GPU, or HPC-related roles
- Strong Python skills and solid experience with PyTorch
- Hands-on experience with LLM training or inference (Llama, GPT-style models, or similar)
- Experience with distributed training (DDP, FSDP, ZeRO, DeepSpeed, or equivalent)
- Good understanding of GPU performance fundamentals (compute vs memory, profiling, optimization)
- Experience working in Linux-based environments
- Familiarity with container technologies (Docker or similar)
- Good level of spoken and written English
Nice to Have (Strong Plus)
- Experience working with MLPerf or other standardized benchmarking frameworks
- Exposure to LLM optimization techniques (activation checkpointing, KV-cache optimization, sequence parallelism)
- Experience with GPU profiling tools (torch.profiler, Nsight, or equivalent)
- Knowledge of GPU kernel optimization (CUDA, HIP, Triton, or similar)
- Experience working with job schedulers (Slurm or equivalent)
- Familiarity with quantization or mixed precision (FP16, BF16, FP8)
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer
2026-05-27
DevOps Engineer (AWS)
2026-05-27
Mobile Engineering Consultant (mid-level)
2026-05-28
- Posted
- Feb 13, 2026
- Type
- Contract
- Level
- Mid-Senior
- Location
- Argentina
- Company
- Empresa Confidencial
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer
2026-05-27
DevOps Engineer (AWS)
2026-05-27
Mobile Engineering Consultant (mid-level)
2026-05-28