Elliott Moss Consulting
Artificial Intelligence Engineer
Elliott Moss ConsultingSingapore1 day ago
ContractInformation Technology

Job Summary

We are seeking a skilled AI Engineer with a minimum of 3+ years of hands-on experience in designing, building, and deploying Large Language Model (LLM)-based solutions. The ideal candidate will be responsible for the end-to-end lifecycle of AI applications, from high-performance model inference and optimization to the development of advanced Agentic AI workflows using RAG and CAG patterns. This role requires close collaboration with product, data, and engineering teams to translate business requirements into scalable, reliable, and cost-efficient AI systems.


Mandatory Skills & Qualifications

  • Bachelor’s degree in Information Technology, Computer Science, Finance, or a related field.
  • Minimum 3+ years of experience working with Large Language Models (LLMs) in production environments.
  • Hands-on expertise with vLLM and model quantization techniques such as AWQ and GPTQ.
  • Strong proficiency in Apache Airflow for scheduling and orchestrating complex data and AI pipelines.
  • Experience with RAGFlow or similar deep-document Retrieval-Augmented Generation (RAG) frameworks.
  • Practical experience with vector databases (e.g., FAISS, Milvus, Pinecone, Weaviate).
  • Proven ability to design and implement multi-agent systems that leverage tools and external APIs to perform multi-step tasks.
  • Advanced proficiency in Python, Docker, and Kubernetes.
  • Experience using AI observability and monitoring tools to track latency, cost, throughput, and hallucination rates.


Key Responsibilities

  • Configure, deploy, and optimize vLLM and other inference frameworks to ensure low-latency, high-throughput LLM serving.
  • Design and implement RAG pipelines using vector databases and Cache-Augmented Generation (CAG) strategies to reduce redundant computation and improve response quality.
  • Deploy and tune vLLM clusters to support scalable, production-grade API endpoints for multiple open-source LLMs.
  • Design, implement, and maintain Apache Airflow DAGs and RAGFlow pipelines to automate the AI lifecycle, including data ingestion, indexing, evaluation, and prompt/version management.
  • Develop, version-control, and continuously refine system prompts, applying techniques such as Chain-of-Thought (CoT) to improve reasoning accuracy and consistency.
  • Implement CAG strategies to optimize KV cache reuse and minimize compute costs for long-context and multi-step AI tasks.
  • Build and refine Agentic AI workflows, enabling autonomous task planning, tool usage, and API orchestration across different LLM backends.
  • Monitor and analyze AI system performance using observability tools, ensuring reliability, cost efficiency, and controlled hallucination rates.
  • Collaborate with cross-functional teams to align AI solutions with business objectives, security standards, and scalability requirements.


Experience Level

  • 3+ years of relevant experience in AI/ML engineering, with demonstrated production experience in LLM-based systems.

Key Skills

Ranked by relevance