Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
JD
Requirements:
- 8+ years as a DevOps & SRE, with a focus on leading SRE practices implementation for the enterprise applications.
- Strong experience with cloud platforms (AWS, GCP, Azure) and services like EC2, S3, Lambda, RDS, etc.
- Hands-on experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, and Ansible.
- Should have experience in building and managing observability frameworks (monitoring, logging, alerting) to track system health and improve performance.
- Proven experience automating key processes such as deployments, testing, and incident response, using CI/CD tools like Jenkins, Argo CD or similar.
- Design, deploy, and manage observability tools and processes, including logging, monitoring, and alerting systems using Elastic stack, Grafana, Prometheus, Dynatrace, New relic.
- Manage and optimize Kubernetes clusters, ensuring scalability, availability, and efficient container orchestration
- Design and manage Helm charts for scalable and reusable Kubernetes deployments, ensuring streamlined application releases and maintenance.
- Hands-on experience with AWS managed databases and self-managed databases like MySQL, Cassandra etc.
- Experience in designing and implementing BCP & DR strategies for availability.
- Building a pro-active monitoring system that works on the methodology of alerting & auto-healing a system to prevent service outages. Also, build customized dashboards.
- Participate in on-call support, handle escalation issues, conduct incident review, write project documentation.
- Expertise in scripting languages like Python, Bash, or Go for automating workflows and infrastructure management.
- Proactively monitor and plan for future capacity needs, ensuring scalable and resilient architectures across AWS resources.
- Experience in conducting fault injection testing, chaos engineering using multiple open-source tools like chaos-mesh, litmus & AWS fault injection service.
Responsibilities & Authorities
Responsibilities:
- Architect and deploy scalable, highly available cloud infrastructure to support production workloads and applications.
- Ensure systems are fault-tolerant, performant, and can handle high-traffic and growing demands.
- Proven experience in overseeing and optimizing application release processes within CI/CD pipelines, ensuring seamless, reliable updates.
- Lead incident response efforts, ensuring minimal service disruption and quick resolution of issues.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure the reliability and performance of applications.
- Maintain clear, detailed documentation on infrastructure, processes, incidents, and operational procedures.
- Work closely with engineering, DevOps, and product teams to align on SRE best practices, promote knowledge sharing, and support application reliability needs.
- Drive continuous improvements in processes, tools, and systems to improve the reliability and performance of production services.
Common responsibilities:
- Comply to Avrioc’s Information security and Information service management policies, procedures, and standards.
- Maintain confidentiality and integrity of information and attend mandatory Information security trainings.
- Report information security incidents through Avrioc’s established incident reporting channel.
Key Skills
Ranked by relevanceReady to apply?
Join Avrioc Technologies and take your career to the next level!
Application takes less than 5 minutes