We are hiring Machine Learning Engineer for a technology client on a yearly renewable contract role. The team is building cutting-edge tools and infrastructure to drive innovation and automation throughout the organisation. In this role you will contribute to the creation of new compute layer using Ray and you will help drive the set-up of the infrastructure for Ray on Kubernetes and its integration with the existing Data system
What you’ll do :
- Deliver high-quality AI infrastructure solutions: You will work with the Machine Leaning Platform team to design and develop the infrastructure to support Ray for distributed data processing and model training. You will develop using GitOps to ensure the reproducibility of the system's cloud infrastructure on different Kubernetes clusters.
- Develop observability solutions for Ray: You will be responsible for developing and integrating monitoring and alerting within the client’smonitoring stack powered by Datadog, Prometheus and Grafana. You will also contribute to the creation of runbooks and DevOps guides.
- Support the data science community in adopting Ray: You will work with our product team to socialise Ray's use. You will be responsible for supporting users in running their job on the Ray clusters.
What you’ll need:
- In-depth knowledge of ML-OPS with a solid understanding of distributed computing for data processing. Knowledge of Ray is preferable, but other frameworks, such as Dask, Modin, Beam, Horovod, and Deepspeed, are also valued.
- Good knowledge of Python and ML ecosystems.
- Strong understanding of developing and deploying systems on Kubernetes.
- Previous experience with GitOps solutions like ArgoCD is preferred. Good knowledge of Helm and Kustomise is also valued.
- Good DevOps background, with Infrastructure as Code (IaC) such as code with Terraform preferred
- At least 3 years of relevant Machine Engineering experience
If this role sounds like an ideal job move, please hit the apply button with your latest resume alternatively you can email me the resume at [email protected]
Regret only shortlisted candidates will be notified.
CEI No: R1659595 / EA No: 07C3147
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer
2026-05-26
DevOps Engineer - AWS
2026-05-27
Software Engineer III, Machine Learning, Research and Products
2026-05-26
- Posted
- Dec 18, 2024
- Type
- Contract
- Level
- Associate
- Location
- Singapore
- Company
- Salt
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
DevOps Engineer
2026-05-26
DevOps Engineer - AWS
2026-05-27
Software Engineer III, Machine Learning, Research and Products
2026-05-26