-
PwC India

Site Reliability Engineer

PwC India
India · Full-time · Mid-Senior

Job Description – Azure Site Reliability Engineer (SRE)

Role Title Azure Site Reliability Engineer (SRE)

Role Summary We are hiring Azure SREs to engineer reliability at scale across mission-critical workloads in a regulated environment. You will design and operate highly available, secure, and cost-efficient Azure platforms with a Terraform-first approach, strong automation, and deep observability. The role includes on-call, incident management, and continuous improvement to reduce toil and improve SLAs/SLOs.

Key Responsibilities

SRE Foundations

· Define SLIs/SLOs, manage error budgets, and gate releases based on reliability risk.

· Lead on-call rotations, major incident response, and blameless postmortems with action tracking.

· Run game days, chaos/resilience drills, and drive toil reduction via automation.

Azure Platform & Governance

· Build CAF-aligned Landing Zones (hub-spoke/Virtual WAN), enforce Azure Policy as Code, tagging, and RBAC/PIM models.

· Engineer secure network topologies: Private Link/Endpoints, Azure Firewall/WAF, DDoS, ExpressRoute, Private DNS.

Infrastructure as Code & Automation

· Terraform (mandatory): design reusable modules, manage remote state & locking, implement policy checks (e.g., tfsec/Checkov/Conftest).

· Implement CI/CD with Azure DevOps/GitHub Actions; automate with PowerShell, Azure CLI, Python.

· Use Key Vault & workload identity for secretless pipelines; enforce PR reviews and plan/apply gates.

Kubernetes (AKS) Operations

· Operate AKS: upgrades (surge), node pool mgmt, HPA/VPA, cluster autoscaler.

· Enforce Network Policies, Pod Security, admission control (OPA/Gatekeeper); secure secrets and images.

· GitOps (Flux/ArgoCD), hardened ACR, image provenance and supply chain controls.

Observability & AIOps

· Build full-stack monitoring with Azure Monitor, Log Analytics, Application Insights, Prometheus/Grafana.

· Create KQL dashboards/alerts, enable synthetic monitoring, and correlate traces with OpenTelemetry.

· Reduce MTTR using automated runbooks (Functions/Logic Apps/Automation) and optimize log/metrics cost.

Resilience, DR & Backup

· Architect HA/DR using Azure Site Recovery (ASR) and region pairs; define & test RTO/RPO.

· Operate Azure Backup with immutability/soft delete; enable Key Vault purge protection.

· Conduct periodic failover/restore drills with evidence and remediation follow-ups.

Security & Compliance

· Implement Zero Trust with Entra ID (RBAC, PIM, Conditional Access), Managed Identities, and least-privilege.

· Enforce baselines with Defender for Cloud; integrate Microsoft Sentinel detections and SOAR playbooks.

· Support audits with change control, evidence, and segregation of duties.

Cost & Capacity (FinOps)

· Set budgets & alerts, rightsizing, reservations/savings plans, storage tiering.

· Optimize observability/storage retention and data flows for cost efficiency.


Required Qualifications

· 5+ years of overall IT industry experience with at least 3+ years of hands on expertise in Azure Site Reliability Engineering.

· Hands-on Terraform (mandatory): module design, state management, pipelines, policy/scanning, drift detection.

· Strong Azure infrastructure: compute, storage, networking (hub-spoke/vWAN, Private Link, Firewall/WAF, DDoS, ExpressRoute).

· AKS operations and container security fundamentals.

· Observability: Azure Monitor, App Insights, KQL, Prometheus/Grafana; SLO dashboarding.

· DR/Backup expertise: ASR, Azure Backup, RTO/RPO planning and test execution.

· Automation proficiency: PowerShell, Azure CLI, Python; Azure Functions/Logic Apps/Automation Accounts.

· Identity & security: Entra ID, RBAC/PIM, Key Vault, Defender for Cloud.

· Certifications: AZ-104 mandatory


Nice to Have

· Microsoft Sentinel (detections, hunting, SOAR runbooks).

· Chaos Studio, performance/load testing, progressive delivery (Blue/Green, Canary, feature flags).

· Data HA/DR across Azure SQL DB/MI, PostgreSQL Flexible Server.

· FinOps practices and cost optimization playbooks.

· Certifications: AZ-305, AZ-400, AZ-700, AZ-500.

Key Skills

Ranked by relevance

terraform vault powershell storage asr incident response postgresql python cloud cicd sql wan
Login to Apply
Posted
Mar 31, 2026
Type
Full-time
Level
Mid-Senior
Location
Greater Bengaluru Area
Company
PwC India

Industries

Business Consulting Services

Categories

Information Technology Consulting Analyst

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Amelco Limited
Related

Site Reliability Engineer

2026-04-10

Full-time
Associate
Poland
Gambling Facilities
Information Technology
View Job Details
Davidson consulting
Related

Business Analyst Front Office (H/F)

2026-04-09

Full-time
Not Applicable
France
Business Consulting
Research
View Job Details
PwC India
Related

Cyber Security Analyst

2026-04-03

Full-time
Associate
India
Business Consulting
Consulting