Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Experience :- 5
Location :- Bengaluru
Role Title
Azure Site Reliability Engineer (SRE)
Role Summary
We are hiring Azure SREs to engineer reliability at scale across mission-critical workloads in a regulated environment. You will design and operate highly available, secure, and cost‑efficient Azure platforms with a Terraform‑first approach, strong automation, and deep observability. The role includes on‑call, incident management, and continuous improvement to reduce toil and improve SLAs/SLOs.
Key Responsibilities
SRE Foundations
- Define SLIs/SLOs, manage error budgets, and gate releases based on reliability risk.
- Lead on‑call rotations, major incident response, and blameless postmortems with action tracking.
- Run game days, chaos/resilience drills, and drive toil reduction via automation.
Azure Platform & Governance
- Build CAF‑aligned Landing Zones (hub‑spoke/Virtual WAN), enforce Azure Policy as Code, tagging, and RBAC/PIM models.
- Engineer secure network topologies: Private Link/Endpoints, Azure Firewall/WAF, DDoS, ExpressRoute, Private DNS.
Infrastructure as Code & Automation
- Terraform (mandatory): design reusable modules, manage remote state & locking, implement policy checks (e.g., tfsec/Checkov/Conftest).
- Implement CI/CD with Azure DevOps/GitHub Actions; automate with Powershell, Azure CLI, Python.
- Use Key Vault & workload identity for secretless pipelines; enforce PR reviews and plan/apply gates.
Kubernetes (AKS) Operations
- Operate AKS: upgrades (surge), node pool mgmt, HPA/VPA, cluster autoscaler.
- Enforce Network Policies, Pod Security, admission control (OPA/Gatekeeper); secure secrets and images.
- GitOps (Flux/ArgoCD), hardened ACR, image provenance and supply chain controls.
Observability & AIOps
- Build full‑stack monitoring with Azure Monitor, Log Analytics, Application Insights, Prometheus/Grafana.
- Create KQL dashboards/alerts, enable synthetic monitoring, and correlate traces with OpenTelemetry.
- Reduce MTTR using automated runbooks (Functions/Logic Apps/Automation) and optimize log/metrics cost.
Resilience, DR & Backup
- Architect HA/DR using Azure Site Recovery (ASR) and region pairs; define & test RTO/RPO.
- Operate Azure Backup with immutability/soft delete; enable Key Vault purge protection.
- Conduct periodic failover/restore drills with evidence and remediation follow‑ups.
Security & Compliance
- Implement Zero Trust with Entra ID (RBAC, PIM, Conditional Access), Managed Identities, and least‑privilege.
- Enforce baselines with Defender for Cloud; integrate Microsoft Sentinel detections and SOAR playbooks.
- Support audits with change control, evidence, and segregation of duties.
Cost & Capacity (FinOps)
- Set budgets & alerts, rightsizing, reservations/savings plans, storage tiering.
- Optimize observability/storage retention and data flows for cost efficiency.
Required Qualifications
- 6+ years of overall IT industry experience with at least 5+ years of hands on expertise in Azure Site Reliability Engineering.
- Hands-on Terraform (mandatory): module design, state management, pipelines, policy/scanning, drift detection.
- Strong Azure infrastructure: compute, storage, networking (hub‑spoke/vWAN, Private Link, Firewall/WAF, DDoS, ExpressRoute).
- AKS operations and container security fundamentals.
- Observability: Azure Monitor, App Insights, KQL, Prometheus/Grafana; SLO dashboarding.
- DR/Backup expertise: ASR, Azure Backup, RTO/RPO planning and test execution.
- Automation proficiency: PowerShell, Azure CLI, Python; Azure Functions/Logic Apps/Automation Accounts.
- Identity & security: Entra ID, RBAC/PIM, Key Vault, Defender for Cloud.
- Certifications: AZ‑104 mandatory
Nice to Have
- Microsoft Sentinel (detections, hunting, SOAR runbooks).
- Chaos Studio, performance/load testing, progressive delivery (Blue/Green, Canary, feature flags).
- Data HA/DR across Azure SQL DB/MI, PostgreSQL Flexible Server.
- FinOps practices and cost optimization playbooks.
- Certifications: AZ‑305, AZ‑400, AZ‑700, AZ‑500.
Key Skills
Ranked by relevanceReady to apply?
Join PwC India and take your career to the next level!
Application takes less than 5 minutes

