Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
We are seeking a highly skilled Site Reliability Engineer (SRE) with experience in building and managing EKS (Elastic Kubernetes Service) environments. The ideal candidate will be responsible for designing, deploying, and maintaining reliable systems while supporting our DevOps practices. A background in observability tools such as ELK (Elasticsearch, Logstash, Kibana) and Grafana is highly preferred.
Key Responsibilities
- EKS Build and Run:
- Design, implement, and manage EKS clusters to ensure high availability and scalability.
- Automate provisioning, deployment, and scaling of EKS environments.
- Monitor and maintain the health and performance of Kubernetes workloads in EKS.
- Site Reliability Engineering:
- Enhance system reliability through the development of monitoring, automation, and fault-tolerant solutions.
- Build tools and automation to streamline infrastructure management and operational tasks.
- Respond to incidents, troubleshoot performance issues, and conduct root cause analysis.
- DevOps Collaboration:
- Support CI/CD pipelines, including integrating EKS into the DevOps lifecycle.
- Ensure seamless collaboration with development teams to deliver infrastructure as code (IaC) and automate deployments.
- Observability & Monitoring:
- Implement and optimize observability solutions using tools like ELK Stack and Grafana.
- Establish robust logging, monitoring, and alerting frameworks to improve system transparency and uptime.
- Kubernetes/EKS Expertise: Strong experience in deploying and managing Kubernetes clusters, specifically on AWS EKS.
- Cloud Platforms: Advanced knowledge of AWS services and infrastructure.
- DevOps Tools: Familiarity with DevOps practices and tools like Terraform, Ansible, Jenkins, or GitLab CI/CD.
- Observability: Hands-on experience with ELK Stack (Elasticsearch, Logstash, Kibana) and Grafana.
- Automation & Scripting: Proficiency in scripting languages (e.g., Python, Bash) and automation frameworks.
- System Administration: Solid understanding of Linux/Unix systems and networking.
- Background in building observability pipelines and frameworks.
- Experience with Prometheus, Loki, or other observability tools is a plus.
- Certification in AWS (e.g., AWS Certified Solutions Architect or DevOps Engineer) is an advantage.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and teamwork abilities.
- A proactive approach to learning and adopting new technologies.