Position Details
Job Title: Site Reliability Engineer
Location: Vancouver, Canada
Job Type - Full Time
Job Overview
At Intellect Design Arena, our Digital Experience Platform is redefining how people shop and bank, delivering seamless, secure, and innovative experiences through a microservice-based SaaS platform on Azure AKS. Certified for SOC2 Level 2 and ISO27001, we’re all about pushing boundaries while keeping reliability strong. As our Site Reliability Engineer (SRE), you’ll ensure our platform’s uptime, leveraging Azure DevOps, Prometheus, Grafana, and Azure Monitor to ensure rock-solid performance. Reporting to our Platform Management Lead, you’ll drive reliability in our GitOps-powered, high-compliance environment.
Responsibilities
- Champion Reliability: Define and monitor SLIs/SLOs to ensure our Digital Experience Platform delivers flawless retail and banking experiences, keeping users happy and businesses thriving across Canada.
- Master Incident Response: Lead incident response, perform root cause analysis (RCA), and implement fixes to keep our platform running 24/7, minimizing disruptions for retailers and bankers.
- Build Observability: Set up and optimize monitoring with Prometheus, Grafana, and Azure Monitor, configuring alerts and dashboards to catch issues before they impact users.
- Automate Reliability: Collaborate with CI/CD & Automation Engineers to integrate reliability checks into Azure DevOps pipelines, ensuring GitOps-driven deployments are stable and secure.
- Fortify Security & Compliance: Work with our overseas Cloud Security team to embed SOC2/ISO27001 controls, ensuring monitoring and incident processes meet compliance standards for secure digital experiences.
- Optimize Performance: Partner with Platform Engineering to tune AKS clusters and microservices, ensuring scalability and low latency for Canada-wide users.
- Tackle Technical Debt: Identify and prioritize technical debt in monitoring, automation, and infrastructure, keeping our platform as clean as a freshly provisioned AKS node.
- Collaborate with Visionaries: Team up with Network Engineering, Azure Cloud Engineering, and overseas Cloud Architecture to build a platform that redefines digital experiences.
- Document the Reliability Magic: Maintain runbooks, document incident RCAs, and create service desk integrations to keep our team aligned and compliance audits breezy.
Qualifications
- Diploma / degree in computer science
- 2+ years of experience in a Developer of System Administrator role
- Reliability Expertise: Proven experience defining SLIs/SLOs, managing incidents, and ensuring high availability in cloud-native environments like Azure AKS.
- Monitoring Mastery: Deep knowledge of Prometheus, Grafana, and Azure Monitor for building observability and alerting systems.
- Automation Skills: Familiarity with Azure DevOps and scripting (Python, Bash, PowerShell) to automate reliability tasks and integrate with GitOps workflows.
- Security Savvy: Experience embedding SOC2/ISO27001 compliance into monitoring and incident processes, ensuring secure digital experiences for retail and banking.
- Problem-Solving Superpowers: A knack for debugging complex issues, performing RCAs, and implementing fixes to keep systems humming.
- Team Player Energy: Strong collaboration skills to work with cross-functional teams, including overseas Cloud Security and Architecture, while reporting to our Platform Management Lead.
- Bonus Points: Experience with ArgoCD or Argo Workflows, microservices, or high-compliance SaaS platforms. A GitHub repo with automation scripts or a passion for disrupting retail and banking is a huge win
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Mobile Engineering Consultant (mid-level)
2026-05-28
Full Stack Software Engineer (all genders)
2026-05-21
DevOps Engineer
2026-05-28
- Posted
- Aug 21, 2025
- Type
- Full-time
- Level
- Mid-Senior
- Location
- Vancouver
- Company
- Intellect Design Arena Ltd
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Mobile Engineering Consultant (mid-level)
2026-05-28
Full Stack Software Engineer (all genders)
2026-05-21
DevOps Engineer
2026-05-28