Site Reliability Engineer (SRE)

Excelon SolutionsCanada15 hours ago

ContractDesign

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Role:- Site Reliability Engineer (SRE) with advanced DevOps

Location:- Toronto, ON – 4 Days Onsite

Type:- Contract

Skills

SRE, Kubernetes, Splunk, bash, docker, terraform, GitLab

Additional Comments:-

Job Summary: We are seeking an experienced Site Reliability Engineer (SRE) with advanced DevOps expertise to help build, scale, and maintain our infrastructure and services.

You will play a critical role in ensuring high availability, performance, scalability, and security of our production systems, while enabling continuous deployment and rapid delivery of features to our customers.

Key Responsibilities:-

· Design, build, and maintain reliable, scalable, and secure cloud-based infrastructure (AWS, Azure, or GCP).

· Develop and improve observability using monitoring, ing, logging, and tracing tools (e.g., Prometheus, Grafana, ELK, Datadog, etc.).

· Automate repetitive tasks and infrastructure using Infrastructure-as-Code (Terraform, CloudFormation, Pulumi).

· Create and maintain CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, ArgoCD, etc.) to support fast and safe delivery.

· Lead incident response, root cause analysis, and postmortems to ensure high uptime and rapid recovery.

· Optimize system performance, reliability, and cost-effectiveness through proactive monitoring and tuning.

· Collaborate with software engineering teams to define SLAs/SLOs and improve service reliability.

· Implement and maintain security best practices across environments (e.g., secrets management, IAM, firewalls, etc.).

· Maintain disaster recovery plans, backups, and high-availability strategies.

Qualifications: Required:-

· 8 years of experience as an SRE, DevOps Engineer, or similar role.

· Proficiency in scripting and automation (Bash, Python, Go, etc.).

· Strong experience with containerization and orchestration (Docker, Kubernetes, Helm).

· Solid understanding of Linux systems administration and networking fundamentals.

· Experience with cloud platforms (AWS, Azure, or GCP).

· Experience with IaC tools like Terraform or CloudFormation.

· Familiarity with GitOps and modern deployment practices.

· Hands-on experience with observability tools (e.g., Prometheus, Grafana, Datadog).

· Strong troubleshooting and incident response

Ready to apply?

Join Excelon Solutions and take your career to the next level!

Application takes less than 5 minutes

Apply

Key Skills

Ready to apply?

Key Skills

Ready to apply?

Key Skills

Ready to apply?