Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Responsibilities:
- Design and implement production-ready ML pipelines on the Cloudera Data Platform (CDP), integrating Cloudera Data Engineering, Data Science Workbench, and Machine Learning components.
- Develop and maintain RAG architectures, combining embedding models, vector databases, and LLM inference layers for enterprise-scale retrieval and reasoning.
- Integrate vector stores such as FAISS, Milvus, Pinecone, or ChromaDB into existing data pipelines and Cloudera ML workflows.
- Implement feature stores, model registries, and CI/CD pipelines for automated deployment and retraining using MLflow, Airflow, and Kubernetes.
- Optimize model inference latency and resource utilization in distributed environments (Spark, YARN, K8s).
- Develop REST/gRPC APIs and microservices to serve models and RAG endpoints.
- Monitor model drift, retraining triggers, and lineage using observability tools and metadata tracking systems.
- Collaborate with data platform engineers to ensure compliance with data governance, lineage, and access control standards in Cloudera
- Advanced proficiency in Python (FastAPI, Pandas, NumPy, Pydantic) with solid experience in Java and Scala for building data pipelines and distributed processing systems.
- Experienced across the full Machine Learning lifecycle using tools such as MLflow, DVC, Airflow, and Kubeflow for orchestration, tracking, and deployment.
- Skilled in managing containerized and distributed environments with Docker, Kubernetes, Spark, YARN, and the Cloudera ML stack (CDP).
- Expertise in designing and optimizing retrieval pipelines using FAISS, Pinecone, Milvus, and ChromaDB for vector search and embedding-based systems.
- Hands-on experience with LangChain, LlamaIndex, and custom RAG architectures, integrating LLMs (OpenAI, Anthropic, Hugging Face) into production environments.
- Strong background in CI/CD and GitOps workflows, leveraging ArgoCD, Jenkins, and GitHub Actions for automated ML deployment.
- Proficient in monitoring and observability using Prometheus, Grafana, and OpenTelemetry to ensure model and system performance.
- Deep understanding of version control and reproducibility with Git, MLflow Model Registry, and Cloudera ML tracking.
- Experience working in large-scale enterprise ML environments built on Cloudera CDP.
- Familiar with data governance, GDPR compliance, data masking, and access control policies.
- Exposure to semantic search, embedding optimization, and prompt orchestration for retrieval-augmented AI systems.
- Strong grasp of distributed systems and data-intensive workloads at petabyte scale.
- Certified in Cloudera Data Platform, AWS, or GCP Machine Learning
Key Skills
Ranked by relevanceReady to apply?
Join Ardanis and take your career to the next level!
Application takes less than 5 minutes