Prosigliere
AI Engineer (NLP)
ProsigliereArgentina13 days ago
Full-timeEngineering

We're looking for a Senior ML/AI Engineer to own and evolve our LLM-powered user experience. You'll work directly with our technical co-founder to build, optimize, and monitor agent systems that parse workout descriptions, provide scaling recommendations, and enable conversational data retrieval - all with production-grade accuracy and speed.

This is a hands-on role focused on the ML/AI engineering side: prompt engineering, model optimization, agent orchestration, and continuous improvement based on real-world usage patterns.

What You’ll Do

Core Responsibilities

  • Own the workout parsing system: improve accuracy of our fine-tuned model (currently Qwen-based) that converts natural language workout descriptions into structured schemas
  • Design and implement agent workflows for workout scaling recommendations and performance tracking
  • Build observability workflows using Langfuse to identify and systematically address model performance issues
  • Optimize agent response latency while maintaining accuracy across our tool-based reasoning system
  • Collaborate on agent architecture decisions, including potential migration to frameworks like DSPy
  • Ship production features: workout entry system, scaling recommendations, and score reporting

What We’re Looking For

Required

  • 5+ years of ML/AI engineering experience with at least 2 years working with LLMs in production
  • Strong prompt engineering and model optimization skills
  • Experience building and deploying agent systems with tools/functions
  • Proven ability to use observability platforms to diagnose and improve model performance
  • Experience with model fine-tuning (any framework/approach)
  • Strong Python programming skills
  • Active CrossFit participant - candidates should understand standard movements and workout structures

Strongly Preferred:

  • Experience with agent orchestration frameworks (DSPy, LlamaIndex, or similar)
  • Background in production ML operations and monitoring
  • Experience with Modal.com or similar serverless ML platforms
  • Track record of iteratively improving LLM systems based on user feedback and metrics
  • Experience fine tuning similar open-source LLMs

Success in First 6 Months

  • Ship workout entry system with improved parsing accuracy
  • Launch basic workout scaling recommendations
  • Implement user score reporting and retrieval
  • Establish robust monitoring workflows to catch and address model failures and poor user experiences
  • Contribute to agent architecture decisions as we scale

Key Skills

Ranked by relevance