AL/ML Engineer
Job Summary:
We are looking for an AI/ML-focused Data Engineer who brings deep expertise in building intelligent data pipelines for unstructured content and is experienced in integrating with modern machine learning ecosystems. The ideal candidate will have hands-on experience in PySpark and Python, with a strong focus on document classification, cleansing, quality metrics, and the ability to work with LLMs, vector databases, and Retrieval-Augmented Generation (RAG) frameworks. Candidates will play a critical role in bridging data engineering and machine learning, enabling the development of AI-first applications across the enterprise.
Key Responsibilities:
- Build robust, scalable data processing pipelines for unstructured documents (PDFs, emails, forms, etc.) using PySpark and Python.
- Implement document cleansing, classification, and enrichment techniques to prepare high-quality data for AI/ML applications.
- Develop and integrate data workflows that feed into LLM-based pipelines and support vector-based retrieval using RAG architectures.
- Engineer vector embeddings, document chunking, and metadata tagging for semantic search and question-answering systems.
- Collaborate closely with AI architect, AI/Data engineers, and platform teams to design end-to-end AI solutions.
- Communicate data readiness, pipeline quality, and model integration strategies clearly to both technical and non-technical stakeholders.
- Apply Agile methodologies and CI/CD best practices to deliver continuously evolving AI capabilities.
Required Skills:
- Overall 5+ years of commercial experience with 2+ years in relevant role
- Strong proficiency in PySpark and distributed data frameworks.
- Solid experience in core Python, including ML/AI libraries (e.g., Transformers, LangChain, Hugging Face, FAISS, etc.).
- Proven expertise in processing unstructured data and document intelligence (OCR, NLP, classification, tagging).
- Familiarity with vector databases (e.g., Redis) and embedding models for RAG pipelines.
- Understanding of LLM lifecycle, including fine-tuning, inference, and prompt engineering.
- Experience working in agile environments, collaborating with cross-functional teams.
- Excellent communication skills with the ability to interface with both technical and business stakeholders.
About CLPS RiDiK
RiDiK is a global technology solutions provider and a subsidiary of CLPS Incorporation (NASDAQ: CLPS), delivering cutting-edge end-to-end services across banking, wealth management, and e-commerce. With deep expertise in AI, cloud, big data, and blockchain, we support clients across Asia, North America, and the Middle East in driving digital transformation and achieving sustainable growth. Operating from regional hubs in 10 countries and backed by a global delivery network, we combine local insight with technical excellence to deliver real, measurable impact. Join RiDiK and be part of an innovative, fast-growing team shaping the future of technology across industries.
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
RPAS Techno Functional Consultant
2026-05-07
Artificial Intelligence Engineer
2026-04-01
Digital Business Analyst
2026-03-30
- Posted
- Jul 18, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Dubai
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
RPAS Techno Functional Consultant
2026-05-07
Artificial Intelligence Engineer
2026-04-01
Digital Business Analyst
2026-03-30