Site Reliability Engineer (SRE) Manager

BANXAAustralia20 hours ago

Full-timeEngineering

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Reporting to the CIO, the SRE Manager will lead and mentor our Site Reliability Engineering team to ensure the continued stability, performance, and resilience of Banxa's production systems.

This strategic role is responsible for shaping and implementing the vision for operational excellence and proactive support. You will be a key leader in establishing mature SRE practices, fostering a culture of reliability, and driving the automation and continuous improvement of our platforms. Your day-to-day will involve managing a team of talented engineers, strategic planning, and collaborating with other senior stakeholders across the business and engineering & other technology teams to build resilient, scalable systems.

Key Responsibilities

Lead and Develop SRE Strategy and Team Culture: Lead, mentor, and grow a high-performing Site Reliability Engineering (SRE) team, developing and executing a strategic roadmap to achieve business objectives for system reliability, performance, and operational excellence.
Own and Mature Incident Management and DORA Compliance: Own the entire incident lifecycle (detection, response, resolution, and root cause analysis) and ensure strict compliance with DORA regulations, including mandated timely reporting (initial, intermediate, and final) of all major ICT-related incidents to competent authorities.
Ensure Operational Resilience and Readiness: Enhance operational readiness by designing and conducting regular tabletop exercises and failure simulations to rigorously test, validate, and improve Business Continuity Plans (BCP), Disaster Recovery (DR) strategies, and incident response playbooks.
Manage and Govern Third-Party Provider Risk: Establish and monitor KPIs for critical third-party providers, ensuring they meet standards for availability, resiliency, and DORA compliance. Maintain a dependency register with clear SLAs and audit rights.
Drive System Stability and Automation: Lead the architecture and operation of comprehensive monitoring and observability to ensure system security, availability, and performance. Guide the team in building and maintaining sustainable systems through automation, infrastructure-as-code, and continuous improvement.
Champion Stakeholder Collaboration and Governance: Serve as the primary SRE liaison, collaborating with development teams and business stakeholders. Participate in system design reviews to ensure new services are built for reliability and scalability, and champion the creation of a central knowledge base for organizational learning.

Qualifications

A degree in computer science, software engineering or a similar field would be advantageous
AWS Developer, SysOps Administrator or DevOps Engineer certification
Certifications in incident management or ITIL are advantageous

Skills:

Proven leadership and mentoring capabilities with experience managing a technical team.
Exceptional interpersonal and communication skills, with the ability to manage and influence technical and non-technical stakeholders.
Strong strategic thinking, critical analysis, and problem-solving abilities.
A proactive and forward-thinking approach to identifying systemic problems, performance bottlenecks, and areas for improvement.
An innovative and collaborative mindset, with a passion for driving continuous improvement through CI/CD and automation

Experience:

At least 7+ years of experience in a technology role, with a minimum of 3 years in a leadership or management capacity within SRE, DevOps, or Platform Engineering.
Demonstrable experience managing application support and web application frameworks (e.g., Laravel).
Deep expertise with AWS services (specifically EC2, Containers, Redis, RDS, S3, SQS, CloudWatch).
Proven experience implementing and managing application monitoring and observability tools (e.g., Datadog, New Relic, OpenTelemetry).
Strong background in managing infrastructure as code (IaC) and CI/CD pipelines.
Experience working in an Agile environment and managing on-call rotations and incident response teams.
Prior experience with operational resilience frameworks like DORA is highly beneficial

About Banxa Holding Inc

Banxa Holding Inc (“Banxa”) is a listed company on the TSX Venture Exchange with global operations. Banxa is one of the fastest-growing payments and compliance infrastructure providers for the digital asset industry. We enable the purchase of digital assets and currencies, such as Bitcoin or USDT, using traditional currencies like USD.

In 2021, Banxa was recognised by The Silicon Review as one of the “50 fastest-growing companies of the year”

Key Skills

Ranked by relevance

Ready to apply?

Join BANXA and take your career to the next level!

Application takes less than 5 minutes

Apply