Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Responsibilities:
- Design, provision, and maintain infrastructure for data platforms and ML workloads, including compute clusters, storage systems, and networking components.
- Implement automation and configuration management using tools such as Ansible, Terraform, or CloudFormation
- Monitor, troubleshoot, and optimize infrastructure performance, ensuring high availability and disaster recovery
- Collaborate with data engineers, ML engineers, and DevOps teams to support pipeline scalability and reproducibility
- Manage containerized environments with Docker and Kubernetes, including orchestration of microservices and ML workloads.
- Implement security best practices, access controls, and compliance policies across cloud and on-premises environments.
- Build and maintain observability frameworks for system monitoring, logging, and alerting
- Strong experience with Linux/Unix systems, networking, and cloud platforms (AWS, GCP, Azure).
- Hands-on experience with Kubernetes, Docker, Terraform, Ansible, or similar orchestration/configuration tools.
- Solid understanding of distributed systems, storage architectures, and high-availability clusters.
- Experience supporting data-intensive workloads and ML/AI pipelines.
- Familiarity with CI/CD pipelines, monitoring tools (Prometheus, Grafana), and incident management.
- Strong scripting skills (Python, Bash, or equivalent).
- Excellent problem-solving, troubleshooting, and collaboration skills
Key Skills
Ranked by relevanceReady to apply?
Join Ardanis and take your career to the next level!
Application takes less than 5 minutes