Morgan McKinley
Site Reliability Engineer
Morgan McKinleyAustralia17 hours ago
ContractRemote FriendlyInformation Technology

Senior Site Reliability Engineer


Contract - 6 months


Sydney


Hybrid


Overview

We’re looking for an experienced Senior Site Reliability Engineer to join a high-impact digital engineering team supporting one of Australia’s most widely used customer-facing eCommerce applications.

This role is all about driving platform stability, performance, and scalability across a complex Azure and Kubernetes environment. You’ll take ownership of monitoring, performance optimisation, and automation initiatives that ensure the digital platforms run smoothly.

Looking for a hands-on engineer who can hit the ground running, work autonomously, and help shape the platform strategy for the future.


Key Responsibilities

  • Maintain and improve the reliability, performance, and scalability of large-scale customer-facing applications.
  • Manage and optimise Azure Kubernetes Service (AKS) clusters ensuring cost efficiency and right-sizing at scale.
  • Implement and refine monitoring, alerting, and observability using tools such as Dynatrace and Azure-native monitoring solutions.
  • Identify and reduce unnecessary logs and alerts to improve signal-to-noise ratio and platform insight.
  • Work closely with software engineering teams (primarily .NET and GraphQL stacks) to diagnose performance issues and improve application behaviour within the clusters.
  • Collaborate on platform automation — driving efficiency and consistency through Infrastructure as Code and CI/CD pipelines.
  • Contribute to defining and executing the platform strategy to ensure reliability, maintainability, and scalability across digital services.
  • Take ownership of incident response, post-mortem analysis, and ongoing performance tuning.
  • Support and optimise Microsoft SQL environments that underpin core application services.


Skills & Experience

  • 7+ years’ experience in Site Reliability Engineering, DevOps, or Platform Engineering roles.
  • Proven experience running and optimising Azure AKS clusters in production at scale.
  • Strong background in application performance tuning and monitoring/alerting frameworks (preferably Dynatrace).
  • Familiarity with .NET and GraphQL application architectures, and an ability to collaborate effectively with development teams to diagnose issues.
  • Strong SQL Server (MSQL) experience for performance monitoring and troubleshooting.
  • Deep understanding of observability, logging, metrics, and tracing best practices.
  • Hands-on experience with automation, scripting, and Infrastructure as Code (PowerShell, Terraform, ARM templates, etc.).
  • A proactive mindset focused on platform stability, cost optimisation, and continuous improvement.
  • Excellent communication skills and the ability to work independently with minimal guidance.

Key Skills

Ranked by relevance