-
Tardis Group

Site Reliability Engineer

Tardis Group
Singapore · Full-time · Associate

About the Company

A rapidly growing technology firm operating at the forefront of artificial intelligence and advanced software solutions. The company fosters a fast-paced, collaborative, and innovation-driven culture, uniting talent across engineering, research, and product teams to create impactful solutions. This role offers the opportunity to work on exciting projects, leverage cutting-edge technologies, and make a real difference in the AI and mobile development space.


Key Responsibilities

Cluster Operations & Management

  • Manage and maintain container clusters (e.g., Kubernetes, Docker) and open-source component clusters (e.g., Kafka, Redis, Elasticsearch) across multiple environments and business units.
  • Monitor and optimize distributed systems to ensure high performance, scalability, and reliability.

Infrastructure Platform Development

  • Design, build, and improve infrastructure operations platforms.
  • Develop and maintain solutions for infrastructure management, CI/CD pipelines, monitoring and alerting systems, and centralized logging.
  • Lead platform standardization efforts and drive automation to streamline operations.

High Availability & Reliability

  • Ensure maximum uptime for production services through proactive monitoring, rapid incident response, and root cause analysis.
  • Continuously refine service architecture, deployment strategies, and operational processes for improved resilience.
  • Implement and maintain SLA/SLO frameworks, applying reliability engineering best practices.

Automation & Process Improvement

  • Develop automated systems for operations and maintenance to minimize manual intervention.
  • Create self-service tools and workflows to boost team productivity.
  • Define and enforce best practices for infrastructure-as-code, configuration management, and change control.


Required Qualifications

Experience & Education

  • Minimum 2 years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering (SRE).
  • Bachelor’s degree in Computer Science, Engineering, or a related technical discipline preferred.

Cloud & Infrastructure

  • Familiarity with public cloud platforms (AWS, Azure, or GCP) is highly valued.
  • Strong understanding of large-scale internet architectures and distributed systems.
  • Proven experience with infrastructure monitoring, logging, and observability tools.

Technical Skills

  • Proficiency in scripting and automation (e.g., Shell, Python).
  • Strong knowledge of containerization technologies (Kubernetes, Docker).
  • Hands-on experience managing production-grade container clusters and maintaining CI/CD pipelines.
  • Familiarity with infrastructure components such as Nginx, MySQL, Redis, Kafka, and Elasticsearch.

Advanced Networking (Preferred)

  • Experience with Service Mesh architectures, Cilium CNI, and eBPF technologies.
  • Understanding of network security, load balancing, and traffic management.
  • Knowledge of cloud-native networking patterns and best practices.


If you’re ready to make an impact in a role that combines software development with cutting-edge AI, we encourage you to apply. Please note that only shortlisted candidates will be contacted.


CEI: 23S1921

Key Skills

Ranked by relevance

kubernetes redis kafka cloud cicd ai configuration management artificial intelligence incident response network security containerization elasticsearch docker devops mysql nginx aws gcp
Login to Apply
Posted
Aug 12, 2025
Type
Full-time
Level
Associate
Location
Singapore

Industries

Software Development

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Amelco Limited
Related

Site Reliability Engineer

2026-04-10

Full-time
Associate
Poland
Gambling Facilities
Information Technology
View Job Details
YO IT Consulting
Related

Infrastructure Security Engineer - Remote

2026-04-08

Full-time
Not Applicable
Turkey
Software Development
Information Technology
View Job Details
Zig by ComfortDelGro
Related

Head of Software Engineering

2026-04-11

Full-time
Mid-Senior
Singapore
Transportation
Information Technology