Trulyyy
Senior Site Reliability Engineer
TrulyyySingapore6 days ago
Full-timeEngineering

My Client are seeking a highly motivated and skilled Site Reliability Engineer in Singapore.


The primary focus is on applying software engineering principles to build the tools and automation necessary to ensure system reliability. The ideal candidate will leverage a strong background in either Site Reliability Engineering or Software Engineering, with a passion for driving operational excellence that directly impacts key business metrics



Key Responsibilities

  1. Design and implement robust, real-time monitoring and alerting systems to ensure continuous service availability and rapid detection of issues.
  2. Develop and manage a centralised dashboard that aggregates disaster metadata, historical trends, and communication links to enable upper management to quickly assess infrastructure disruptions.
  3. Drive the implementation and testing of comprehensive Disaster Recovery strategies to minimize downtime and ensure business continuity.
  4. Collaborate with development teams to optimize the performance and resilience of our Microservices architecture, ensuring optimized system performance.
  5. Establish and maintain robust monitoring systems to significantly enhance performance visibility and debugging capabilities.
  6. Apply software engineering practices to automate operational tasks that reduce disaster recovery time and minimize operational costs.


Qualifications

  1. Bachelor’s degree in Computer Science or a related technical field (preferred).
  2. 1+ years of experience in systems operations or site reliability engineering.
  3. Proven expertise in establishing and maintaining monitoring systems with Prometheus and Grafana.
  4. Demonstrated experience in real-time monitoring and implementing effective Disaster Recovery solutions.
  5. Experience working with and optimizing systems built on a Microservices architecture.
  6. Strong analytical and problem-solving skills, with a focus on enhancing performance visibility and debugging.
  7. Ability to translate complex operational data into clear, actionable insights for a centralized management dashboard.
  8. Proficiency in a programming language (e.g., Python, Go) for automation and tooling development.


Regrettably, only shortlisted candidates will be notified.


Please note that data provided is for recruitment purposes only.

Business Registration No.: 202004228R | License. No. - 20S0118 | EA Registration No. - 【R1986587】

Key Skills

Ranked by relevance