Luminance
Site Reliability Engineer
LuminanceAustralia1 day ago
Full-timeOther
The Role

Luminance's Site Reliability team combines strong problem solving, infrastructure tooling and wider DevOps practices to provide a service of Luminance's unique software applications. The team plays a crucial role in incident response and issue resolution, swiftly addressing and resolving service interruptions to maintain the highest level of customer satisfaction. With a focus on automation, scalability, reliability and security, the team enable Luminance to ensure a performant, seamless experience for its users. The Site Reliability team is a small, dynamic team of creative engineers and work together to tackle some of Luminance's greatest challenges, with new problems and technology areas to dig into on a regular basis.

Roles And Responsibilities

System Monitoring: Implement, manage, and develop internal monitoring tools to ensure system health and quickly detect anomalies. Respond and resolve incidents efficiently to maintain uptime.

Automation: Develop automation solutions for infrastructure management, issue resolution and deployment processes, streamlining operations and reducing manual work.

Infrastructure Management: Manage cloud infrastructure to ensure reliability and scalability, collaborating with teams to design robust solutions.

Incident Management: Conduct post-incident analysis to identify root causes, implement preventive measures, and enhance system resilience.

Security and Compliance: Maintain best security practices and compliance standards, working with security teams to address vulnerabilities proactively.

Collaboration and Communication: Partner with development and operations teams, fostering communication and promoting reliability best practices across the organization.

Requirements

  • Masters in Computer Science, Engineering or related subject from a Go8 University
  • Excellent problem-solving skills, including diagnosing issues within complex systems
  • Ability and desire to identify root causes of issues, and propose and implement structural improvements
  • Strong communication skills and capability to perform in scenarios with urgency
  • Knowledge of the design and operation of web-based software applications, based on technologies such as node.js, PostgreSQL or Elasticsearch
  • Knowledge of modern infrastructure and operational tooling within cloud-based architectures, such as Linux, python, AWS, ansible, Prometheus

Key Skills

Ranked by relevance