Lead LLM

Licorne Society a été missionné par une startup IA en pleine croissance pour les aider à trouver leur Lead LLM Engineer.

What You Will Own

You will be responsible for one thing:

Make our AI outputs reliable, fast, and indispensable in real workflows.

Concretely

Design and evolve our LLM / agent architecture
Own output quality across key use cases (emails, document analysis, etc.)
Build evaluation systems (datasets, metrics, regression detection)
Drive fast iteration loops from production data
Improve retrieval, reasoning, and tool usage
Ensure production reliability (latency, failure modes, fallback)
Work directly with product + founders on what to build and why

What This Role Is Really About

Most teams fail because:

they don’t know what “good output” means
they don’t have evals
they iterate randomly
they overuse agents

Your job is to fix that.

You Will Turn

vague user problems
→ into structured AI systems
→ with measurable performance
→ that improve every week

What You Need To Be Excellent At

Shipping real LLM systems
You’ve built systems used in production (not demos)
You understand RAG, tools, agents, structured outputs
You can design full pipelines, not just prompts
Evaluation-driven development
You know how to define quality metrics
You build datasets from real usage
You run continuous evals to prevent regressions
Debugging complex failures
You can trace issues across:

retrieval
prompts
model behavior

You don’t guess — you isolate and fix
Speed of iteration
You move from problem → improvement in hours or days, not weeks
You use logs, traces, and data — not intuition alone
Strong judgment
You know when to:

use an agent vs a pipeline
add complexity vs simplify

You optimize for reliability and user value, not novelty

What We Don’t Care About

Number of years of experience
Whether you’ve used a specific framework
Fancy research credentials

If you can build, debug, and improve real systems, you’re a fit.

What Success Looks Like (first 90 Days)

Clear eval framework for core use cases
Measurable improvement in output quality
Faster iteration cycles across the team
Reduced hallucinations / failures
Stronger system architecture decisions

Stack (context, Not Requirements)

Python (FastAPI)
Postgres
Google Cloud
LangGraph / LangChain (evolving)
PostHog (product analytics)
Langfuse (LLM traces)
LLM APIs (Azure OpenAI)

Key Skills

Related Jobs

Backend Engineer

AI Engineer

Full-Stack Developer (senior)

Related Jobs

Backend Engineer

AI Engineer

Full-Stack Developer (senior)

Cookie Settings