Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Luminance's Site Reliability team combines strong problem solving, infrastructure tooling and wider DevOps practices to provide a service of Luminance's unique software applications. The team plays a crucial role in incident response and issue resolution, swiftly addressing and resolving service interruptions to maintain the highest level of customer satisfaction. With a focus on automation, scalability, reliability and security, the team enable Luminance to ensure a performant, seamless experience for its users. The Site Reliability team is a small, dynamic team of creative engineers and work together to tackle some of Luminance's greatest challenges, with new problems and technology areas to dig into on a regular basis.
Roles And Responsibilities
System Monitoring: Implement, manage, and develop internal monitoring tools to ensure system health and quickly detect anomalies. Respond and resolve incidents efficiently to maintain uptime.
Automation: Develop automation solutions for infrastructure management, issue resolution and deployment processes, streamlining operations and reducing manual work.
Infrastructure Management: Manage cloud infrastructure to ensure reliability and scalability, collaborating with teams to design robust solutions.
Incident Management: Conduct post-incident analysis to identify root causes, implement preventive measures, and enhance system resilience.
Security and Compliance: Maintain best security practices and compliance standards, working with security teams to address vulnerabilities proactively.
Collaboration and Communication: Partner with development and operations teams, fostering communication and promoting reliability best practices across the organization.
Requirements
- Masters in Computer Science, Engineering or related subject from a Go8 University
- Excellent problem-solving skills, including diagnosing issues within complex systems
- Ability and desire to identify root causes of issues, and propose and implement structural improvements
- Strong communication skills and capability to perform in scenarios with urgency
- Knowledge of the design and operation of web-based software applications, based on technologies such as node.js, PostgreSQL or Elasticsearch
- Knowledge of modern infrastructure and operational tooling within cloud-based architectures, such as Linux, python, AWS, ansible, Prometheus
Key Skills
Ranked by relevanceReady to apply?
Join Luminance and take your career to the next level!
Application takes less than 5 minutes