Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
The AI Platform Engineer helps design, build and maintain the on-premises AI Infrastructure for our client, a market leader in the environment sector. The engineer plays a key role in enabling large-scale machine learning and generative AI workloads by developing robust, scalable, and secure platform solutions that support data scientists and ML engineers across the organisation.
The role requires deep Kubernetes and infrastructure automation experience to optimise performance, streamline deployments, and ensure reliability in an on-prem environment.
Requirements
- Design, deploy, and manage on-premises Kubernetes clusters for AI and ML workloads
- Develop and maintain infrastructure-as-code using tools like Terraform, Helm, or Ansible
- Build and optimise AI/ML pipelines and MLOps workflows for model training, deployment, and monitoring
- Collaborate with data science and engineering teams to deliver high-performance computing environments for large model training and inference
- Implement resource management, observability, and scaling strategies for GPU-based workloads
- Manage containerisation, networking, and storage solutions for AI workloads
- Ensure security, compliance, and reliability of the AI platform
- Automate operational processes and continuously improve platform efficiency
- Proven experience managing Kubernetes clusters on-premises (not just cloud-managed solutions)
- Experience working in a Consultancy capacity
- Strong background in container orchestration, CI/CD, and automation
- Proficiency in Linux administration, Docker, and networking fundamentals
- Hands-on experience with infrastructure-as-code (Terraform, Helm, Ansible, etc.)
- Experience supporting AI/ML workloads (e.g., TensorFlow, PyTorch, Hugging Face, Ray, Kubeflow)
- Familiarity with GPU scheduling and resource management on Kubernetes
- Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK, etc.)
- Strong scripting skills (Python, Bash, or Go)
- Understanding of DevOps and MLOps best practices
Benefits & Growth Opportunities:
- Competitive salary and performance bonuses
- Comprehensive health insurance
- Professional development and certification support
- Opportunity to work on cutting-edge AI projects
- International exposure and travel opportunities
- Flexible working arrangements
- Career advancement opportunities in a rapidly growing AI company
Key Skills
Ranked by relevanceReady to apply?
Join Deeplight AI and take your career to the next level!
Application takes less than 5 minutes