Artificial Intelligence Engineer

AI Performance Engineer

We are looking for a AI Performance Engineer in Latin America to work on latest large AI model knowledge, deep learning performance optimization and benchmarking on modern GPU-based systems, with a strong focus on MLPerf Training and Inference workloads.

The primary models we work on include Llama 2, Llama 3, DeepSeek, and open-source GPT-style models (GPT-OSS).

This is a hands-on engineering role involving performance profiling, PyTorch optimization, large-scale distributed training, and building reproducible benchmarking environments, in close collaboration with other performance- and systems-focused engineers.

What You Will Do

Optimize training and inference pipelines for large language models such as Llama 2, Llama 3, DeepSeek, and GPT-OSS
Work on MLPerf Training and/or Inference benchmarks for LLM workloads
Profile GPU workloads to identify compute, memory, and communication bottlenecks
Improve scaling efficiency across multi-GPU and multi-node setups
Tune distributed training strategies (DDP, FSDP, ZeRO, tensor/pipeline parallelism)
Build and maintain reproducible benchmark environments (Docker / Singularity)
Collaborate with engineers on performance, stability, and scalability improvements
Document findings and contribute to benchmark submissions and internal reports

What We Expect (Required)

1-2 year of AI engineering knowledge / Deep Learning, GPU, or HPC-related roles
Strong Python skills and solid experience with PyTorch
Hands-on experience with LLM training or inference (Llama, GPT-style models, or similar)
Experience with distributed training (DDP, FSDP, ZeRO, DeepSpeed, or equivalent)
Good understanding of GPU performance fundamentals (compute vs memory, profiling, optimization)
Experience working in Linux-based environments
Familiarity with container technologies (Docker or similar)
Good level of spoken and written English

Nice to Have (Strong Plus)

Experience working with MLPerf or other standardized benchmarking frameworks
Exposure to LLM optimization techniques (activation checkpointing, KV-cache optimization, sequence parallelism)
Experience with GPU profiling tools (torch.profiler, Nsight, or equivalent)
Knowledge of GPU kernel optimization (CUDA, HIP, Triton, or similar)
Experience working with job schedulers (Slurm or equivalent)
Familiarity with quantization or mixed precision (FP16, BF16, FP8)

Artificial Intelligence Engineer

Key Skills

Related Jobs

AI Product Engineer

Lead Data Scientist

Chief Data Scientist

Related Jobs

AI Product Engineer

Lead Data Scientist

Chief Data Scientist

Cookie Settings