The Platform Operations Engineer will be responsible for ensuring the stability, scalability, and reliability of the Cloud File Transfer (CFT) platform on AWS. This role involves operational ownership, incident management, cloud infrastructure optimisation, and continuous improvement across a mission-critical, high-throughput environment supporting multiple government agencies.
Key Responsibilities
- Lead cloud platform operations for the CFT platform, focusing on monitoring, performance optimisation, reliability, release operations, and continuous improvement within AWS environments.
- Manage and optimise AWS cloud infrastructure to ensure scalability, high availability, security, and cost efficiency.
- Establish and maintain operational processes including runbooks, dashboards, daily health checks, incident communication workflows, and operational reporting with actionable insights.
- Own L2 incident management, including troubleshooting, impact assessment, escalation handling, and timely resolution of issues within defined SLAs.
- Collaborate closely with engineering, security, and agency stakeholders to resolve platform incidents effectively.
- Drive change, release, and maintenance processes through risk assessments, mitigation planning, and execution of system upgrades and infrastructural improvements.
- Review test results to ensure releases meet operational, performance, and security requirements.
- Define and improve operational OKRs, SLAs, and platform reliability metrics.
- Support portal and backend enhancements, bug fixes, and operational tooling to improve system performance, reliability, and maintainability.
- Document and share operational best practices, incident learnings, and technical knowledge across the team and wider programme.
- Contribute to strengthening engineering standards and raising overall platform reliability through continuous improvement initiatives.
Experience
- Degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
- Minimum 2 years’ experience managing production workloads in public cloud environments (preferably AWS).
- Strong problem-solving skills across cloud infrastructure, distributed systems, and application layers.
- Experience in handling production incidents with ownership, urgency, and high attention to detail.
- Familiarity with defining and enforcing operational processes, procedures, and best practices.
- Experience maintaining secure, high-availability cloud environments with preventative operational controls.
- Understanding of change management, impact analysis, and service reliability improvements.
- Hands-on experience operating applications on AWS.
- Experience working on Singapore Government agency projects.
- Terraform – Infrastructure as Code (IaC) and cloud resource provisioning.
- GitLab – CI/CD pipelines, automation, and version control.
- AWS Services – Strong understanding of AWS components supporting production-grade workloads (e.g., EC2, S3, IAM, Lambda, CloudWatch, VPC, RDS, etc.).
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Kubernetes Engineer
2026-05-26
DevOps Engineer - Ansible & Automation
2026-05-19
DevOps Engineer (all genders)
2026-05-28
- Posted
- Feb 05, 2026
- Type
- Full-time
- Level
- Associate
- Location
- Singapore
- Company
- HCLTech
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Kubernetes Engineer
2026-05-26
DevOps Engineer - Ansible & Automation
2026-05-19
DevOps Engineer (all genders)
2026-05-28