-
Talentvis

Machine Learning Engineer

Talentvis
Singapore · Contract · Associate

Day-to-Day Responsibilities:

  • Building and enhancing the Ray-based compute layer for distributed data processing and model training on Kubernetes.
  • You’ll work closely with the data science and engineering teams to set up and integrate the Ray infrastructure into the existing system.
  • AI Infrastructure Development: Collaborate with the ML Platform team to design and implement a robust AI infrastructure using Ray, enabling scalable data processing and model training. Leverage GitOps practices for managing cloud infrastructure reproducibility on Kubernetes.
  • Observability & Monitoring: Develop observability solutions for Ray, integrating monitoring and alerting capabilities using tools like Datadog, Prometheus, and Grafana. You’ll also contribute to creating operational guides and runbooks.
  • Support for Data Science Teams: Assist data scientists in adopting Ray for their workloads and ensure smooth integration with existing tools and systems.


Required Skills and Qualifications:

  • ML Ops Expertise: Strong understanding of machine learning operations, particularly in distributed computing environments. Experience with frameworks like Ray, Dask, Modin, Beam, Horovod, or Deepspeed is highly desirable.
  • Technical Skills: Proficiency in Python and the broader ML ecosystem.
  • Kubernetes Experience: Solid understanding of Kubernetes, with experience in deploying and managing systems. Familiarity with GitOps tools such as ArgoCD and configuration management tools like Helm and Kustomize is a plus.
  • DevOps & Infrastructure Skills: Background in DevOps practices and Infrastructure as Code (IaC), with knowledge of Terraform or similar tools.
  • Communication Skills: Strong written and verbal communication abilities, with a focus on collaboration and knowledge sharing.


Nice-to-Have:

  • Ray Knowledge: A genuine interest in Ray is critical. Candidates without an interest in Ray will be considered a red flag by the hiring manager.
  • Cloud Providers: Experience with cloud platforms, particularly AWS, is preferred.
  • Incident Response & Security: Basic knowledge of incident response and security principles.

Key Skills

Ranked by relevance

c kubernetes ai ui cloud esp ha incident response devops git configuration management infrastructure as code distributed computing machine learning prometheus terraform deepspeed grafana datadog python scala aws das
Login to Apply
Posted
Dec 12, 2024
Type
Contract
Level
Associate
Location
Singapore
Company
Talentvis

Industries

Technology Information Media

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Kpler
Related

Power ML Engineer

2026-05-26

Full-time
Not Applicable
Singapore
Technology
Engineering
View Job Details
BLACKROC Recruitment
Related

Cyber Security Engineer

2026-05-27

Full-time
Not Applicable
Australia
Technology
Information Technology
View Job Details
EPAM Systems
Related

DevOps Engineer

2026-05-27

Full-time
Associate
Argentina
Software Development
Engineering