-
Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!
View all jobs
Site Reliability Engineer- Terraform, Backstage
Canada
· Full-time
·
Mid-Senior
Years of Experience: 6-8
We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of platform services. The ideal candidate will bring strong expertise in SRE practices, observability, infrastructure automation, and developer platform enablement, with exposure to modern technologies including policy-as-code and emerging GenAI-driven systems.
Required Skills
Strong experience in SRE practices and reliability engineering
Hands-on expertise with:
Monitoring/logging platforms and distributed tracing
SLO/SLI frameworks and observability design
Experience in incident management and performance engineering
Strong understanding of DORA metrics and operational excellence
Proficiency in:
Terraform (Infrastructure as Code)
Policy as Code (OPA/Rego, Sentinel)
Experience with:
Developer platform tools (Backstage, service catalogs)
Golden paths and platform standardization
Key Responsibilities
Implement and manage SRE practices including:
Incident management, root cause analysis, and postmortems
Reliability engineering and performance optimization
Tracking and improving DORA metrics
Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
Build and manage monitoring, logging, and distributed tracing frameworks
Ensure platform reliability through proactive alerting, observability, and automation
Automate infrastructure and governance using:
Terraform (Infrastructure as Code)
Policy-as-Code tools (OPA/Rego, Sentinel)
Enhance developer experience and productivity by:
Designing self-service platform capabilities
Managing service catalogs and platform standards
Building reusable templates and golden paths
Work with tools like Backstage to enable internal developer platforms
Collaborate with engineering teams to improve system stability, deployment reliability, and operational efficiency
Support integration and reliability considerations for GenAI-based systems (RAG, prompt workflows, model evaluation)
Nice to Have
Exposure to GenAI platforms, RAG, and prompt engineering concepts
Experience in developer productivity measurement and platform engineering initiatives
Tools & Methodologies
Experience with Agile methodologies (Jira, Confluence)
Familiarity with DevOps and platform engineering practices
Soft Skills
Strong problem-solving and analytical skills
Ability to work in high-pressure production environments
Excellent communication and cross-team collaboration
We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of platform services. The ideal candidate will bring strong expertise in SRE practices, observability, infrastructure automation, and developer platform enablement, with exposure to modern technologies including policy-as-code and emerging GenAI-driven systems.
Required Skills
Strong experience in SRE practices and reliability engineering
Hands-on expertise with:
Monitoring/logging platforms and distributed tracing
SLO/SLI frameworks and observability design
Experience in incident management and performance engineering
Strong understanding of DORA metrics and operational excellence
Proficiency in:
Terraform (Infrastructure as Code)
Policy as Code (OPA/Rego, Sentinel)
Experience with:
Developer platform tools (Backstage, service catalogs)
Golden paths and platform standardization
Key Responsibilities
Implement and manage SRE practices including:
Incident management, root cause analysis, and postmortems
Reliability engineering and performance optimization
Tracking and improving DORA metrics
Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
Build and manage monitoring, logging, and distributed tracing frameworks
Ensure platform reliability through proactive alerting, observability, and automation
Automate infrastructure and governance using:
Terraform (Infrastructure as Code)
Policy-as-Code tools (OPA/Rego, Sentinel)
Enhance developer experience and productivity by:
Designing self-service platform capabilities
Managing service catalogs and platform standards
Building reusable templates and golden paths
Work with tools like Backstage to enable internal developer platforms
Collaborate with engineering teams to improve system stability, deployment reliability, and operational efficiency
Support integration and reliability considerations for GenAI-based systems (RAG, prompt workflows, model evaluation)
Nice to Have
Exposure to GenAI platforms, RAG, and prompt engineering concepts
Experience in developer productivity measurement and platform engineering initiatives
Tools & Methodologies
Experience with Agile methodologies (Jira, Confluence)
Familiarity with DevOps and platform engineering practices
Soft Skills
Strong problem-solving and analytical skills
Ability to work in high-pressure production environments
Excellent communication and cross-team collaboration
Key Skills
Ranked by relevance
devops
jira
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Application Release Engineer (CI/CD & Azure DevOps)
2026-05-07
Full-time
Mid-Senior
Canada
IT Services
Information Technology
View Job Details
Related
DevOps Engineer – Public Cloud & Kubernetes (GCP/AWS/Azure)
2026-05-07
Full-time
Associate
Canada
IT Services
Information Technology
View Job Details
Related
AWS ML Engineer
2026-05-06
Full-time
Associate
Canada
IT Services
Information Technology
Login to Apply
- Posted
- May 13, 2026
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Toronto
Industries
IT Services
IT Consulting
Categories
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Application Release Engineer (CI/CD & Azure DevOps)
2026-05-07
Full-time
Mid-Senior
Canada
IT Services
Information Technology
View Job Details
Related
DevOps Engineer – Public Cloud & Kubernetes (GCP/AWS/Azure)
2026-05-07
Full-time
Associate
Canada
IT Services
Information Technology
View Job Details
Related
AWS ML Engineer
2026-05-06
Full-time
Associate
Canada
IT Services
Information Technology