AI Engineer (all genders)

ellamindGermany10 hours ago

Full-timeEngineering, Information Technology

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

At ellamind, we build evaluation-first AI infrastructure. Our platform elluminate turns AI evaluation from ad-hoc “vibe checks” into rigorous, repeatable engineering to enable teams to test, measure, and improve LLM applications with confidence.

What you'll do

Build AI evaluation systems: Design and implement intelligent systems that automatically assess LLM outputs for quality, safety, and compliance with user-defined requirements across diverse use cases and industries.
Integrate cutting-edge AI technologies: Work with multiple LLM providers and AI platforms, ensuring seamless, reliable connections that handle real-world production scenarios at scale.
Develop automated testing frameworks: Create sophisticated workflows that enable teams to systematically evaluate their AI applications through batch testing, comparing model configurations, and tracking performance over time.
Optimize evaluation workflows: Design efficient systems that balance evaluation quality, cost, and speed, enabling customers to run comprehensive tests without breaking their budgets or timelines.
Build prompt optimization infrastructure: Develop systematic approaches to analyze prompt performance across large datasets, identify failure patterns, implement A/B testing frameworks, and create data-driven optimization pipelines that surface actionable insights.
Scale AI operations: Architect systems that handle high-volume evaluation workloads, managing concurrent processing, resource allocation, and ensuring consistent results across thousands of test cases.
Advance evaluation methodologies: Research and implement novel approaches to AI testing, quality measurement, and automated scoring that push the boundaries of what's possible in AI evaluation.
Drive technical innovation: Explore emerging AI capabilities, experiment with new models and techniques, and integrate breakthrough technologies that give our customers a competitive advantage.
Ensure production reliability: Build robust, enterprise-grade systems with proper monitoring, error handling, and quality assurance that customers can depend on for critical AI validation workflows.

You'll work across our Python-based platform, collaborating with fullstack engineers, product teams, and directly with customers to understand their evaluation challenges and deliver solutions that make rigorous AI testing accessible to teams of all sizes.

What we're looking for

Must-haves

Strong Python engineering skills: Experience building production AI systems with clean, maintainable code, comprehensive testing, and performance optimization at scale.
Hands-on LLM experience: Practical work with OpenAI, Anthropic, or similar APIs - you've built features that call LLMs, handle responses, implement retry logic, and solve real-world reliability and consistency challenges.
Software engineering fundamentals: Solid understanding of API design, data modeling, async processing, error handling, and building distributed systems that scale efficiently.
AI systems thinking: Experience designing evaluation methodologies, understanding model behavior and limitations, debugging inconsistent outputs, and implementing quality assurance for AI applications.
Enthusiasm for AI reliability: Genuine interest in testing, measuring, and improving AI systems - you care about building AI that works consistently and can be trusted in production.
On-site collaboration: ≥3 days/week in Berlin or Bremen. Travel to our Bremen HQ during onboarding.
Fluency in English: At least B2 level for team collaboration and technical discussions.
Valid EU work authorization.

Nice-to-haves

Experience with AI evaluation frameworks, LLM benchmarking, and automated testing methodologies for AI systems.
Background in LLM fine-tuning, RAG architectures, embedding models, or other advanced AI techniques.
Experience building developer tools, SDKs, or platforms for AI/ML teams.
Familiarity with experiment tracking platforms, versioning systems for prompts/models, or MLOps workflows.
Comfort with backend frameworks (Django, FastAPI) and databases (PostgreSQL) - you can work across the stack when needed.
Experience with async workers, Docker/Kubernetes, and CI/CD workflows.
Understanding of AI safety, compliance requirements, or privacy-sensitive/on-prem deployments.
Experience working directly with clients or end-users to understand requirements, gather feedback, and translate technical solutions into business value.
German language skills.

What Matters Most

We prioritize demonstrated excellence in your projects and career. If you’re motivated to build and optimize AI solutions, we want to hear from you—even if you don’t meet every single criterion.

Diversity & inclusion

Different perspectives make us stronger. We welcome applicants from all backgrounds and encourage you to apply.

Why us?

Shape the future of AI development: You’ll have significant influence on our product and technology direction while building critical infrastructure that every serious AI team needs.
Technical excellence meets cutting-edge innovation: Work with state-of-the-art LLM technologies across multiple providers (local LLMs, OpenAI, Anthropic, and more) on complex challenges like evaluation algorithm design, large-scale batch processing, and intelligent quality assessment systems - built on a clean Python-based architecture without legacy constraints holding you back.
Career-defining opportunity: You’ll be building essential AI evaluation infrastructure during a massive market transformation. As systematic testing becomes fundamental to AI development, you’ll be at the center of this shift, working on technology that’s becoming as critical as version control.
Ownership and impact: Get full end-to-end ownership of features, direct collaboration with AI researchers and fullstack engineers, and immediate feedback on how your code helps teams ship better AI products. Your engineering decisions directly shape how thousands of developers work.
Competitive package with upside: In addition to a competitive salary, we offer a VSOP (Virtual Stock Option Program) to give you a real stake in the company’s success as we grow this essential AI infrastructure.
Best-in-class development experience: Fast and streamlined access to all AI technologies that make your life (and development work) easier, plus the latest tools and platforms to maximize your productivity.
Work environment: Our Bremen office features stunning waterfront views, complimentary beverages, smoothies, and a boat. We’re opening our Berlin office at the end of 2025, giving you flexibility as we expand.
Grow with transformative technology: Build deep expertise in AI evaluation and LLM infrastructure alongside our expanding team, mastering the technologies that are reshaping software development while helping define industry standards.

About Us

We are a cash-flow-positive Germany-based AI startup building elluminate—the enterprise platform that turns AI evaluation from ad-hoc experiments into rigorous, repeatable workflows so teams can ship reliable AI with confidence. Teams use elluminate to design test suites, benchmark models, track regressions, and ship reliable AI with clear, measurable quality gates. We pair elluminate with custom large-language-model solutions and full on-prem deployment options. Our products have already earned the trust of renowned clients such as Deutsche Telekom, the German Federal Government, and leading health insurers like hkk.

Rooted in Bremen and collaborating with leading organizations, our team has a track record in advanced model and dataset development. We like owning problems end-to-end and shipping pragmatically, and contribute to the open-source community across initiatives like OpenEuroLLM, and regularly publish models and tools to accelerate the broader ecosystem.

Compensation Range: €60K - €100K

Key Skills

Ranked by relevance

Ready to apply?

Join ellamind and take your career to the next level!

Application takes less than 5 minutes

Apply