LANDI Global
Site Reliability Engineer
LANDI GlobalSingapore16 hours ago
Full-timeEngineering, Information Technology

Job Overview

As a Site Reliability Engineer at LANDI Global, you will be responsible for the operation, reliability, and performance of the company’s platform infrastructure. You will work closely with R&D and cross-functional teams to ensure high availability, scalability, and operational excellence across multiple environments.


This role combines hands-on technical execution with proactive monitoring, incident response, and continuous improvement of systems and processes. You will play an important role in maintaining platform stability while supporting ongoing growth and new client onboarding.


Key Responsibilities

Platform Reliability & Operations

  • Build, operate, and maintain platform infrastructure across multiple environments.
  • Ensure platform availability, reliability, and scalability in collaboration with R&D teams.
  • Provide operational support for production systems and participate in incident response and root cause analysis.
  • Participate in a 24/7 on-call / standby rotation to support critical platform operations.


Monitoring, Performance & Resilience

  • Implement and maintain monitoring, alerting, and observability solutions to ensure timely detection and resolution of issues.
  • Analyze system performance, reliability metrics, and logs to identify improvement opportunities.
  • Contribute to cost optimization and capacity planning initiatives.
  • Support and maintain Disaster Recovery (DR) and business continuity plans.


Automation & DevOps Practices

  • Contribute to automation, CI/CD pipelines, and deployment processes to improve efficiency and reduce operational risk.
  • Support automated testing and release processes to ensure stable and repeatable deployments.
  • Assist in managing change management and incident reporting processes.


Environment & Client Support

  • Support environment provisioning and deployments for new client onboarding and platform expansions.
  • Collaborate with internal teams to ensure smooth rollout of infrastructure and application changes.


Experience & Qualifications

  • Minimum 3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
  • Strong verbal and written communication skills in English.
  • Ability to work independently while collaborating effectively within a distributed team.


Preferred Technical Skills

Candidates should have hands-on experience in several of the following areas:

  • Cloud platforms (e.g. AWS, Azure)
  • Linux/Unix-based distributed systems
  • Programming or scripting languages (e.g. Python, Bash, Go)
  • Monitoring and observability tools (e.g. Prometheus, Grafana, Zabbix)
  • Configuration management tools (e.g. Ansible, Chef, Puppet)
  • SQL databases (e.g. PostgreSQL, MySQL)
  • Load balancing and reverse proxy technologies (e.g. Nginx)
  • CI/CD tools (e.g. Jenkins, GitLab)
  • Containerization and orchestration technologies (e.g. Docker, Kubernetes)

Key Skills

Ranked by relevance