Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Job Summary
We are seeking a skilled AI Engineer with a minimum of 3+ years of hands-on experience in designing, building, and deploying Large Language Model (LLM)-based solutions. The ideal candidate will be responsible for the end-to-end lifecycle of AI applications, from high-performance model inference and optimization to the development of advanced Agentic AI workflows using RAG and CAG patterns. This role requires close collaboration with product, data, and engineering teams to translate business requirements into scalable, reliable, and cost-efficient AI systems.
Mandatory Skills & Qualifications
- Bachelor’s degree in Information Technology, Computer Science, Finance, or a related field.
- Minimum 3+ years of experience working with Large Language Models (LLMs) in production environments.
- Hands-on expertise with vLLM and model quantization techniques such as AWQ and GPTQ.
- Strong proficiency in Apache Airflow for scheduling and orchestrating complex data and AI pipelines.
- Experience with RAGFlow or similar deep-document Retrieval-Augmented Generation (RAG) frameworks.
- Practical experience with vector databases (e.g., FAISS, Milvus, Pinecone, Weaviate).
- Proven ability to design and implement multi-agent systems that leverage tools and external APIs to perform multi-step tasks.
- Advanced proficiency in Python, Docker, and Kubernetes.
- Experience using AI observability and monitoring tools to track latency, cost, throughput, and hallucination rates.
Key Responsibilities
- Configure, deploy, and optimize vLLM and other inference frameworks to ensure low-latency, high-throughput LLM serving.
- Design and implement RAG pipelines using vector databases and Cache-Augmented Generation (CAG) strategies to reduce redundant computation and improve response quality.
- Deploy and tune vLLM clusters to support scalable, production-grade API endpoints for multiple open-source LLMs.
- Design, implement, and maintain Apache Airflow DAGs and RAGFlow pipelines to automate the AI lifecycle, including data ingestion, indexing, evaluation, and prompt/version management.
- Develop, version-control, and continuously refine system prompts, applying techniques such as Chain-of-Thought (CoT) to improve reasoning accuracy and consistency.
- Implement CAG strategies to optimize KV cache reuse and minimize compute costs for long-context and multi-step AI tasks.
- Build and refine Agentic AI workflows, enabling autonomous task planning, tool usage, and API orchestration across different LLM backends.
- Monitor and analyze AI system performance using observability tools, ensuring reliability, cost efficiency, and controlled hallucination rates.
- Collaborate with cross-functional teams to align AI solutions with business objectives, security standards, and scalability requirements.
Experience Level
- 3+ years of relevant experience in AI/ML engineering, with demonstrated production experience in LLM-based systems.
Key Skills
Ranked by relevanceReady to apply?
Join Elliott Moss Consulting and take your career to the next level!
Application takes less than 5 minutes

