Astra Tech
Sr SRE
Astra TechUnited Arab Emirates8 hours ago
OtherEngineering, Information Technology

Job Responsibilities:

Automate routine operational tasks using Shell scripting, ensuring efficiency in log analysis, batch management, and system optimization.

Maintain and optimize middleware components supporting infrastructure operations, ensuring stability and performance.

Administer and optimize Kubernetes clusters, ensuring scalability, security, and performance.

Maintain and optimize monitoring and alerting systems based on Prometheus, ensuring high availability of services.

Contribute to the development of CI/CD pipelines Manage cloud resources efficiently, implementing cost optimization strategies to reduce cloud expenditure.

Improve operational processes, develop automation tools, troubleshoot incidents, and enhance system stability and reliability.

Job Requirements:

Proficiency in Shell scripting for automating operational workflows and system management tasks.

Experience in Python or Go, preferably for system automation, tooling, or backend services.

At least 2 years of hands-on Kubernetes administration experience, including expertise in CSI, CNI, and managing clusters with 20+ nodes in production.

Experience with Prometheus for monitoring and alerting in an enterprise environment.

Familiarity with CI/CD deployment processes, with knowledge of GitOps principles. Hands-on experience with GitOps is a plus.

Experience managing cloud platforms using Infrastructure as Code (IaC) tools like Terraform/OpenTofu. Azure experience is a plus.

Strong problem-solving skills, a proactive approach to troubleshooting, and a commitment to improving operational efficiency and system reliability.

Bonus Points: Experience managing large-scale distributed systems and microservices architecture. Background in Site Reliability Engineering (SRE) best practices

Key Skills

Ranked by relevance