Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
You will architect and build the automated, scalable infrastructure that powers our entire suite of AI models—from Agentic AI and NLU to Voice Biometrics and ASR—ensuring they operate flawlessly and securely for millions of users. You will make key technical decisions, establishing the patterns and practices that will guide our machine learning operations for years to come.
This is a hands-on technical leadership role for an engineer who wants to make a career-defining impact. You won't just be joining a team; you'll be setting the standard for how we build, deploy, and operate machine learning at scale.
Your Mission: Architecting the Future of AI Operations
As our first dedicated MLOps Architect, you will have the autonomy and resources to build our ML platform from scratch. You will be responsible for the entire lifecycle of our production AI systems, ensuring they are reliable, secure, and automated. You will:
- Architect and build the automated, scalable infrastructure that powers our entire suite of AI models—from Agentic AI and NLU to Voice Biometrics and ASR—ensuring they operate flawlessly and securely for millions of users
- Make key technical decisions, establishing the patterns, tools, and best practices that will guide our machine learning operations for years to come
- Collaborate closely with world-class researchers, data scientists, ML engineers, and cloud architects to translate cutting-edge research into robust, production-grade products
- Champion a culture of automation, governance, and performance across all our AI/ML initiatives
- Infrastructure as Code (IaC) Foundation: You will design and implement our entire MLOps infrastructure on AWS from the ground up using Terraform, establishing best practices for security, scalability, and cost-efficiency
- CI/CD for Machine Learning: You will build and own the end-to-end CI/CD pipelines using GitLab and Jenkins, automating everything from model training and validation to canary deployments and production rollbacks
- Containerization & Orchestration at Scale: You will lead the productization of our complex ML models, containerizing them with Docker and deploying them on a robust Kubernetes platform that you will help architect, build, and manage with Helm
- Proactive Observability: You will establish a culture of deep system insight by implementing and managing a comprehensive observability stack (e.g., Prometheus and Grafana), ensuring our models meet stringent performance, reliability, and security SLAs
We are looking for an experienced engineer with a builder's mindset and a passion for creating elegant, scalable systems. You have a proven track record of operating critical infrastructure at scale and thrive in an environment where you can take ownership and drive technical strategy.
Requirements
- 5+ years in a Senior DevOps, SRE, or MLOps role with a focus on production systems
- Deep expertise in architecting and managing Kubernetes clusters in a production environment
- Proven mastery of at least one major IaC tool (Terraform is strongly preferred)
- Strong proficiency in a systems-level scripting language (e.g., Python, Go)
- A track record of building and maintaining CI/CD pipelines for critical production services
- Direct experience deploying and managing specific ML models (e.g., Agentic AI, NLU, ASR, TTS)
- Experience with dedicated ML workflow orchestration tools (e.g., Kubeflow, Apache Airflow)
- Familiarity with ML experiment tracking and model registry tools (e.g., MLflow, SageMaker Model Registry)
- Experience deploying models on specialized hardware (e.g., GPUs, Inferentia, Trainium, etc.)
- Fixed compensation;
- Long-term employment with the working days vacation;
- Development in professional growth (courses, training, etc);
- Being part of successful cutting-edge technology products that are making a global impact in the service industry;
- Proficient and fun-to-work-with colleagues;
- Apple gear
Key Skills
Ranked by relevanceReady to apply?
Join Omilia and take your career to the next level!
Application takes less than 5 minutes