Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
 - Change status (Applied, Interview, Offer, etc.)
 - Add personal notes and comments
 - Set reminders for follow-ups
 - Track your entire application journey
 
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
 - Review job details later without searching again
 - Compare with other saved opportunities
 - Keep a collection of interesting positions
 - Receive notifications about saved jobs before they expire
 
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Reporting to the CIO, the SRE Manager will lead and mentor our Site Reliability Engineering team to ensure the continued stability, performance, and resilience of Banxa's production systems.
This strategic role is responsible for shaping and implementing the vision for operational excellence and proactive support. You will be a key leader in establishing mature SRE practices, fostering a culture of reliability, and driving the automation and continuous improvement of our platforms. Your day-to-day will involve managing a team of talented engineers, strategic planning, and collaborating with other senior stakeholders across the business and engineering & other technology teams to build resilient, scalable systems.
Key Responsibilities
- Lead and Develop SRE Strategy and Team Culture: Lead, mentor, and grow a high-performing Site Reliability Engineering (SRE) team, developing and executing a strategic roadmap to achieve business objectives for system reliability, performance, and operational excellence.
 - Own and Mature Incident Management and DORA Compliance: Own the entire incident lifecycle (detection, response, resolution, and root cause analysis) and ensure strict compliance with DORA regulations, including mandated timely reporting (initial, intermediate, and final) of all major ICT-related incidents to competent authorities.
 - Ensure Operational Resilience and Readiness: Enhance operational readiness by designing and conducting regular tabletop exercises and failure simulations to rigorously test, validate, and improve Business Continuity Plans (BCP), Disaster Recovery (DR) strategies, and incident response playbooks.
 - Manage and Govern Third-Party Provider Risk: Establish and monitor KPIs for critical third-party providers, ensuring they meet standards for availability, resiliency, and DORA compliance. Maintain a dependency register with clear SLAs and audit rights.
 - Drive System Stability and Automation: Lead the architecture and operation of comprehensive monitoring and observability to ensure system security, availability, and performance. Guide the team in building and maintaining sustainable systems through automation, infrastructure-as-code, and continuous improvement.
 - Champion Stakeholder Collaboration and Governance: Serve as the primary SRE liaison, collaborating with development teams and business stakeholders. Participate in system design reviews to ensure new services are built for reliability and scalability, and champion the creation of a central knowledge base for organizational learning.
 
Qualifications
- A degree in computer science, software engineering or a similar field would be advantageous
 - AWS Developer, SysOps Administrator or DevOps Engineer certification
 - Certifications in incident management or ITIL are advantageous
 
Skills:
- Proven leadership and mentoring capabilities with experience managing a technical team.
 - Exceptional interpersonal and communication skills, with the ability to manage and influence technical and non-technical stakeholders.
 - Strong strategic thinking, critical analysis, and problem-solving abilities.
 - A proactive and forward-thinking approach to identifying systemic problems, performance bottlenecks, and areas for improvement.
 - An innovative and collaborative mindset, with a passion for driving continuous improvement through CI/CD and automation
 
Experience:
- At least 7+ years of experience in a technology role, with a minimum of 3 years in a leadership or management capacity within SRE, DevOps, or Platform Engineering.
 - Demonstrable experience managing application support and web application frameworks (e.g., Laravel).
 - Deep expertise with AWS services (specifically EC2, Containers, Redis, RDS, S3, SQS, CloudWatch).
 - Proven experience implementing and managing application monitoring and observability tools (e.g., Datadog, New Relic, OpenTelemetry).
 - Strong background in managing infrastructure as code (IaC) and CI/CD pipelines.
 - Experience working in an Agile environment and managing on-call rotations and incident response teams.
 - Prior experience with operational resilience frameworks like DORA is highly beneficial
 
About Banxa Holding Inc
Banxa Holding Inc (“Banxa”) is a listed company on the TSX Venture Exchange with global operations. Banxa is one of the fastest-growing payments and compliance infrastructure providers for the digital asset industry. We enable the purchase of digital assets and currencies, such as Bitcoin or USDT, using traditional currencies like USD.
In 2021, Banxa was recognised by The Silicon Review as one of the “50 fastest-growing companies of the year”
Key Skills
Ranked by relevanceReady to apply?
Join BANXA and take your career to the next level!
Application takes less than 5 minutes

