Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Job Overview
As a Site Reliability Engineer at LANDI Global, you will be responsible for the operation, reliability, and performance of the company’s platform infrastructure. You will work closely with R&D and cross-functional teams to ensure high availability, scalability, and operational excellence across multiple environments.
This role combines hands-on technical execution with proactive monitoring, incident response, and continuous improvement of systems and processes. You will play an important role in maintaining platform stability while supporting ongoing growth and new client onboarding.
Key Responsibilities
Platform Reliability & Operations
- Build, operate, and maintain platform infrastructure across multiple environments.
- Ensure platform availability, reliability, and scalability in collaboration with R&D teams.
- Provide operational support for production systems and participate in incident response and root cause analysis.
- Participate in a 24/7 on-call / standby rotation to support critical platform operations.
Monitoring, Performance & Resilience
- Implement and maintain monitoring, alerting, and observability solutions to ensure timely detection and resolution of issues.
- Analyze system performance, reliability metrics, and logs to identify improvement opportunities.
- Contribute to cost optimization and capacity planning initiatives.
- Support and maintain Disaster Recovery (DR) and business continuity plans.
Automation & DevOps Practices
- Contribute to automation, CI/CD pipelines, and deployment processes to improve efficiency and reduce operational risk.
- Support automated testing and release processes to ensure stable and repeatable deployments.
- Assist in managing change management and incident reporting processes.
Environment & Client Support
- Support environment provisioning and deployments for new client onboarding and platform expansions.
- Collaborate with internal teams to ensure smooth rollout of infrastructure and application changes.
Experience & Qualifications
- Minimum 3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
- Strong verbal and written communication skills in English.
- Ability to work independently while collaborating effectively within a distributed team.
Preferred Technical Skills
Candidates should have hands-on experience in several of the following areas:
- Cloud platforms (e.g. AWS, Azure)
- Linux/Unix-based distributed systems
- Programming or scripting languages (e.g. Python, Bash, Go)
- Monitoring and observability tools (e.g. Prometheus, Grafana, Zabbix)
- Configuration management tools (e.g. Ansible, Chef, Puppet)
- SQL databases (e.g. PostgreSQL, MySQL)
- Load balancing and reverse proxy technologies (e.g. Nginx)
- CI/CD tools (e.g. Jenkins, GitLab)
- Containerization and orchestration technologies (e.g. Docker, Kubernetes)
Key Skills
Ranked by relevanceReady to apply?
Join LANDI Global and take your career to the next level!
Application takes less than 5 minutes

