Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Role Summary:
We are looking for a Site Reliability Engineer (SRE) to maintain the availability, scalability, and performance of critical services deployed across cloud and on-premise environments. This role combines software engineering and systems engineering to automate operations and improve reliability in CI/CD and production environments.
Key Responsibilities:
- Maintain uptime and performance of applications deployed across hybrid infrastructure
- Implement observability (logging, metrics, tracing) using Prometheus, Grafana, ELK, Azure Monitor
- Troubleshoot production issues, participate in incident response, and root cause analysis
- Automate infrastructure, monitoring, and runbooks using IaC tools and scripting
- Implement and track SLOs, SLIs, and error budgets
- Build self-healing systems and resilient deployments
- Collaborate with developers, security teams, and cloud engineers to enforce reliability practices
Required Skills:
- Experience with Azure/AWS/GCP monitoring tools and on-prem observability stacks
- Strong in Linux/Unix administration, scripting (Python, Bash)
- Hands-on with CI/CD pipelines, Kubernetes, and Helm
- Good understanding of load balancing, failover, HA architecture
- Familiar with incident management, postmortem writing, and runbook creation
Preferred Qualifications:
- Experience with Terraform, Ansible, or Pulumi
- Knowledge of service mesh (Istio, Linkerd) and API gateway configurations
- Certifications: SRE Foundation, Azure/AWS Cloud Practitioner, or Kubernetes Administrator (CKA)
- Awareness of compliance standards (CIS, NIST, ISO 27001)
Key Skills
Ranked by relevanceReady to apply?
Join Confidential and take your career to the next level!
Application takes less than 5 minutes