-
HCLTech

Site Reliability Engineer

HCLTech
Canada · Full-time · Mid-Senior

Join our SRE L2 squad supporting ~1000 AWS-hosted services. You’ll own operational reliability, rapid triage, and proactive maintenance across production and non-prod, partnering closely with Cloud Engineering, SOC, and application teams.

Key Responsibilities

  • Deliver 24×7 monitoring, incident response, and problem management; drive MTTA/MTTR reduction and SLO/SLI adherence.
  • Perform preventive health checks; analyze ticket trends to implement continual service improvements and automation to reduce toil.
  • Execute blameless postmortems and high-quality RCA; maintain SOPs/runbooks and reliability dashboards.
  • Configure/tune observability (Dynatrace, CloudWatch, ELK); enable self-healing workflows and workload optimizations.
  • Support change/service requests within agreed SLAs; collaborate during transitions and onboard new AWS services.

Core Skills & Tools

  • AWS: Lambda, ECS/Fargate/EC2, API Gateway, SNS/SQS, Kinesis, RDS; IAM/KMS foundations.
  • Observability & ITSM: Dynatrace, CloudWatch, ELK; ServiceNow for incidents/changes; SLI/SLO dashboards.
  • Toil Reduction
  • Reliability Practices: Error budgets, capacity/performance benchmarking, automation/runbook execution, FinOps awareness.

Qualifications

  • 5+ years SRE/DevOps or L2 operations for cloud-native stacks; strong AWS production experience.
  • Proven incident/change/problem management in 24×7 environments; adept at RCA and postmortems.
  • Hands-on with observability tooling and operational automation; excellent collaboration and documentation skills.

Shift Coverage & Locations

Follow-the-sun model with overlapping handoffs across Canada/India to ensure continuous support. Success is measured by uptime, MTTR/MTTD, change failure rate, error-budget consumption, SLO adherence, RCA quality, and CSI throughput.

Key Skills

Ranked by relevance

aws cloud elk incident response
Login to Apply
Posted
Apr 10, 2026
Type
Full-time
Level
Mid-Senior
Location
Greater Toronto Area
Company
HCLTech

Industries

IT Services IT Consulting

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
CI&T
Related

Site Reliability Engineer (SRE) Mid-Level / Senior, Portugal

2026-04-11

Full-time
Not Applicable
Portugal
IT Services
Engineering
View Job Details
HCLTech
Related

Network Engineer

2026-04-07

Full-time
Mid-Senior
Canada
IT Services
Information Technology
View Job Details
NEORIS
Related

DevOps Engineer

2026-04-10

Full-time
Not Applicable
Spain
IT Services
Engineering