Site Reliability Engineer (SRE) with EKS Expertise
Job Summary

We are seeking a highly skilled Site Reliability Engineer (SRE) with experience in building and managing EKS (Elastic Kubernetes Service) environments. The ideal candidate will be responsible for designing, deploying, and maintaining reliable systems while supporting our DevOps practices. A background in observability tools such as ELK (Elasticsearch, Logstash, Kibana) and Grafana is highly preferred.

Key Responsibilities

  • EKS Build and Run:
  • Design, implement, and manage EKS clusters to ensure high availability and scalability.
  • Automate provisioning, deployment, and scaling of EKS environments.
  • Monitor and maintain the health and performance of Kubernetes workloads in EKS.
  • Site Reliability Engineering:
  • Enhance system reliability through the development of monitoring, automation, and fault-tolerant solutions.
  • Build tools and automation to streamline infrastructure management and operational tasks.
  • Respond to incidents, troubleshoot performance issues, and conduct root cause analysis.
  • DevOps Collaboration:
  • Support CI/CD pipelines, including integrating EKS into the DevOps lifecycle.
  • Ensure seamless collaboration with development teams to deliver infrastructure as code (IaC) and automate deployments.
  • Observability & Monitoring:
  • Implement and optimize observability solutions using tools like ELK Stack and Grafana.
  • Establish robust logging, monitoring, and alerting frameworks to improve system transparency and uptime.

Required Skills & Experience

  • Kubernetes/EKS Expertise: Strong experience in deploying and managing Kubernetes clusters, specifically on AWS EKS.
  • Cloud Platforms: Advanced knowledge of AWS services and infrastructure.
  • DevOps Tools: Familiarity with DevOps practices and tools like Terraform, Ansible, Jenkins, or GitLab CI/CD.
  • Observability: Hands-on experience with ELK Stack (Elasticsearch, Logstash, Kibana) and Grafana.
  • Automation & Scripting: Proficiency in scripting languages (e.g., Python, Bash) and automation frameworks.
  • System Administration: Solid understanding of Linux/Unix systems and networking.

Preferred Qualifications

  • Background in building observability pipelines and frameworks.
  • Experience with Prometheus, Loki, or other observability tools is a plus.
  • Certification in AWS (e.g., AWS Certified Solutions Architect or DevOps Engineer) is an advantage.

Soft Skills

  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and teamwork abilities.
  • A proactive approach to learning and adopting new technologies.

Skills: kubernetes,building,gitlab ci/cd,eks,ci,grafana,aws,infrastructure,automation,elk stack,reliability,networking,jenkins,ansible,bash,terraform,unix,skills,linux,devops,python
Post Date
2024-11-21
Job Type
-
Employment type
Contract
Category
Engineering, Information Technology
Level
Mid-Senior
Country
Lithuania
Industry
Technology , Information , Internet ,
GiGa-Ops Global Solutions*******