AI/ML Platform Engineer

MainderSpain13 days ago

Full-timeRemote FriendlyOther

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

AI Platform Engineer

About Our Partner

We are partnering with an innovative technology company focused on advancing AI-powered software solutions through cutting-edge integration of Large Language Models and intelligent systems. They are building sophisticated AI infrastructure designed to bridge the gap between AI research and production-ready applications, with particular focus on scalable model deployment, MLOps pipelines, and enterprise-grade AI services. The company is seeking highly motivated professionals to help scale their AI platform from experimental prototypes to commercial deployment.

Position Overview

We are supporting our partner in their search for an AI Platform Engineer to join their Barcelona office. As the critical bridge between AI research and production software, you will play a key role in building and maintaining the AI infrastructure that enables researchers to deploy their work safely, reproducibly, and at scale. Your role will involve architecting model serving infrastructure, implementing MLOps pipelines, optimizing AI performance, and collaborating closely with AI developers and backend engineers to integrate cutting-edge AI capabilities into production systems.

Your Mission

AI Infrastructure and Model Deployment

Build and maintain AI infrastructure including model serving, vector databases, and embedding pipelines
Deploy and serve LLMs from multiple providers (OpenAI, Anthropic, HuggingFace, fine-tuned models)
Implement vector database solutions (Pinecone, Chroma, Weaviate, FAISS) for efficient retrieval
Optimize inference latency, costs, throughput, and reliability across AI services
Design and implement caching, rate limiting, and retry strategies for production AI systems
Enable AI developers to deploy their work reproducibly and safely at scale

MLOps and Pipeline Development

Version models, prompts, datasets, and evaluation results systematically
Implement experiment tracking using tools like Weights & Biases or MLflow
Build CI/CD pipelines specifically for model deployment and testing
Monitor model performance, drift, and system health in production
Set up comprehensive logging and observability for AI services
Define workflows from notebook/test repository → PR → staging → production
Establish best practices for moving from research to production

API Development and Integration

Design and implement robust APIs for AI inference using FastAPI
Create endpoints for prompt testing, model selection, and evaluation
Build APIs for prompt management and experimentation
Integrate AI services seamlessly with backend application architecture
Ensure API reliability, security, performance, and proper error handling
Implement async programming patterns for efficient AI service delivery

Collaboration and Technical Enablement

Work closely with AI developers (researchers) to productionize their experiments
Collaborate with backend engineers to integrate AI capabilities into the product
Define and document workflows for AI development and deployment
Review code and mentor AI developers on software engineering best practices
Document AI infrastructure, APIs, and operational procedures
Enable research teams to move faster from idea to production

Performance Optimization and Reliability

Optimize AI inference latency and cost efficiency
Implement monitoring and alerting for AI service health
Debug complex distributed AI systems and resolve production issues
Ensure high availability and fault tolerance of AI services
Conduct performance profiling and implement optimization strategies
Balance trade-offs between latency, cost, throughput, and model quality

The Impact You Will Have

Enable AI researchers to deploy their innovations into production systems safely and efficiently
Build the AI platform infrastructure that scales from experiments to enterprise deployment
Reduce inference costs and latency while maintaining model quality
Establish MLOps standards that support the company's long-term AI strategy
Directly influence the reliability, performance, and scalability of AI-powered features
Bridge the gap between cutting-edge research and practical production systems

Key Requirements

Proven experience (5+ years) in software engineering, preferably with focus on AI/ML systems
Strong programming skills in Python with experience in production environments
Experience with LLMs and AI/ML in production: OpenAI API, HuggingFace, LangChain, or similar frameworks
Understanding of vector databases (Pinecone, Chroma, Weaviate, FAISS) and similarity search
Cloud infrastructure experience: GCP (Vertex AI preferred) or AWS (SageMaker)
API development expertise: FastAPI, REST, async programming patterns
CI/CD and DevOps skills: Docker, Terraform, GitHub Actions
Monitoring and observability experience for distributed systems
Problem-solving mindset: comfortable debugging complex distributed AI systems
Operating experience with AI deployment in enterprise environments
Fluent oral and written communication in English (additional European languages are a plus)

Highly Valuable

Experience fine-tuning or training machine learning models
Familiarity with AI frameworks (LangChain, Pydantic AI, or similar)
Knowledge of prompt engineering techniques and evaluation methodologies
Experience with real-time inference and streaming responses
Background in data engineering or ML engineering roles
Understanding of RAG (Retrieval-Augmented Generation) architectures
Experience with experiment tracking tools (MLflow, Weights & Biases)
Contributions to open-source AI/ML projects
Knowledge of Kubernetes for container orchestration
Experience with model versioning and A/B testing frameworks
Familiarity with cost optimization strategies for LLM deployments

Tech Stack

Languages

Python (primary)
Bash scripting

AI/ML Frameworks

OpenAI API, Anthropic, Gemini
HuggingFace Transformers
LangChain, Pydantic AI

Vector Databases

Pinecone, Chroma, Weaviate, FAISS

Backend & APIs

FastAPI, SQLAlchemy, Pydantic

Cloud & Infrastructure

Google Cloud Platform (Vertex AI, Cloud Run, Cloud Functions)
Terraform (Infrastructure as Code)

MLOps & Monitoring

Experiment Tracking: MLflow, Weights & Biases, or custom solutions
CI/CD: GitHub Actions, Cloud Build
Containers: Docker, Kubernetes (optional)

Who You Are

Resilient and open to challenges, with a never-give-up attitude
A collaborative team player who bridges technical and research teams effectively
Self-motivated, detail-oriented, and systematic problem-solver
Proactive and creative in solving issues under uncertainty with a "solutions, not problems" mindset
Professional with excellent written and verbal communication skills
Able to work independently and handle confidential information
Passionate about making AI research accessible and production-ready
Dedicated to establishing and maintaining sustainable MLOps processes and best practices
Committed to continuous learning in the rapidly evolving AI landscape
Comfortable operating at the intersection of research and engineering

What They Offer

Highly Competitive Compensation: Top-of-market salary package that reflects your expertise and the value you bring
Cutting-Edge Technology: Work with state-of-the-art AI technologies and the latest LLMs from leading providers
Work-Life Balance: Flexible work arrangements with options for remote work
Professional Growth: Opportunities to attend industry conferences, engage with the AI/ML community, and expand your technical expertise
Impact-Driven Culture: Join a passionate team focused on solving challenging problems at the intersection of AI research and production engineering
Technical Autonomy: Shape the AI platform architecture and have real influence on infrastructure decisions
Learning Environment: Work alongside AI researchers and engineers pushing the boundaries of what's possible

Why Join?

At this company, you will be working on technology that brings cutting-edge AI research into real-world production applications. This is your opportunity to build the AI platform infrastructure that enables researchers to deploy innovative models at scale while maintaining enterprise-grade reliability and performance. You'll see your work directly enable groundbreaking AI capabilities, while collaborating with a talented team of AI developers, backend engineers, and product leaders. If you're ready to take your expertise in AI infrastructure and MLOps to the next level and want to be at the forefront of production AI systems, we want to hear from you!

We are dedicated to creating a diverse, inclusive, and authentic workplace. If this role excites you but your background doesn't perfectly match every qualification, we still encourage you to apply. You could be the perfect fit for this position or another opportunity within our growing team.

Key Skills

Ranked by relevance

Ready to apply?

Join Mainder and take your career to the next level!

Application takes less than 5 minutes

Apply