Site Reliability Engineer

LANDI Global

Singapore · Full-time · Mid-Senior

Job Overview

As a Site Reliability Engineer at LANDI Global, you will be responsible for the operation, reliability, and performance of the company’s platform infrastructure. You will work closely with R&D and cross-functional teams to ensure high availability, scalability, and operational excellence across multiple environments.

This role combines hands-on technical execution with proactive monitoring, incident response, and continuous improvement of systems and processes. You will play an important role in maintaining platform stability while supporting ongoing growth and new client onboarding.

Key Responsibilities

Platform Reliability & Operations

Build, operate, and maintain platform infrastructure across multiple environments.
Ensure platform availability, reliability, and scalability in collaboration with R&D teams.
Provide operational support for production systems and participate in incident response and root cause analysis.
Participate in a 24/7 on-call / standby rotation to support critical platform operations.

Monitoring, Performance & Resilience

Implement and maintain monitoring, alerting, and observability solutions to ensure timely detection and resolution of issues.
Analyze system performance, reliability metrics, and logs to identify improvement opportunities.
Contribute to cost optimization and capacity planning initiatives.
Support and maintain Disaster Recovery (DR) and business continuity plans.

Automation & DevOps Practices

Contribute to automation, CI/CD pipelines, and deployment processes to improve efficiency and reduce operational risk.
Support automated testing and release processes to ensure stable and repeatable deployments.
Assist in managing change management and incident reporting processes.

Environment & Client Support

Support environment provisioning and deployments for new client onboarding and platform expansions.
Collaborate with internal teams to ensure smooth rollout of infrastructure and application changes.

Experience & Qualifications

Minimum 3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
Strong verbal and written communication skills in English.
Ability to work independently while collaborating effectively within a distributed team.

Preferred Technical Skills

Candidates should have hands-on experience in several of the following areas:

Cloud platforms (e.g. AWS, Azure)
Linux/Unix-based distributed systems
Programming or scripting languages (e.g. Python, Bash, Go)
Monitoring and observability tools (e.g. Prometheus, Grafana, Zabbix)
Configuration management tools (e.g. Ansible, Chef, Puppet)
SQL databases (e.g. PostgreSQL, MySQL)
Load balancing and reverse proxy technologies (e.g. Nginx)
CI/CD tools (e.g. Jenkins, GitLab)
Containerization and orchestration technologies (e.g. Docker, Kubernetes)

Key Skills

Ranked by relevance

incident response devops high availability postgresql prometheus jenkins ansible grafana python docker bash cicd aws

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Site Reliability Engineer

2026-04-02

Full-time

Mid-Senior

Singapore

IT Services

Engineering