We are looking for a skilled Site Reliability Engineer (SRE) to drive the reliability, scalability, and performance of our AWS-based service desk platform. This role will own the end-to-end AWS cloud infrastructure and DevOps pipelines, focusing on automation, system resilience, and operational excellence. The ideal candidate will treat operations as a software problem, minimizing manual intervention and ensuring a seamless experience for both agents and customers.
Key Responsibilities
1. AWS Connect & Service Desk Reliability
- Design, deploy, and maintain the Amazon Connect ecosystem, including Contact Flows, Lambda integrations, and Lex bots using Infrastructure as Code (Terraform/CloudFormation).
- Ensure high availability and performance of voice and chat channels with minimal latency and optimal audio quality.
- Manage integrations between Amazon Connect and ITSM tools such as ServiceNow, Jira Service Management, or Salesforce.
- Perform proactive capacity planning to handle peak traffic, including telephony quotas and concurrent workloads.
2. Cloud Infrastructure & Security
- Manage and optimize core AWS services including EC2, ECS/EKS, S3, Lambda, DynamoDB, and VPC networking.
- Implement security best practices, including IAM least-privilege access, encryption (KMS), and compliance with standards such as SOC2, HIPAA, or PCI-DSS.
- Monitor and optimize cloud costs through effective FinOps practices.
3. DevOps & CI/CD Engineering
- Build and maintain CI/CD pipelines using tools such as GitLab CI, GitHub Actions, Jenkins, or AWS CodePipeline.
- Automate deployments for infrastructure, Lambda functions, and conversational bots.
- Integrate automated testing to validate workflows, APIs, and contact flows prior to production release.
- Ensure consistency across environments (Sandbox, Staging, Production) through standardized deployment patterns.
4. Observability & Incident Management
- Develop monitoring dashboards and alerts using CloudWatch, X-Ray, and tools like Grafana, Datadog, or Splunk.
- Lead incident response and troubleshooting for production issues.
- Conduct root cause analysis and blameless post-mortems.
- Define and manage SLOs, SLIs, and error budgets to maintain system reliability.
Required Skills & Qualifications
Technical Skills
- Strong expertise in Amazon Connect (Contact Flows, CTRs, CCP customization).
- Hands-on experience with AWS services including Lambda, DynamoDB, S3, IAM, and networking.
- Proficiency in Infrastructure as Code tools such as Terraform (preferred), CloudFormation, or AWS CDK.
- Experience building CI/CD pipelines using GitLab, GitHub Actions, Jenkins, or similar tools.
- Strong programming/scripting skills in Python or Node.js.
- Experience with observability tools such as CloudWatch, Kinesis, ELK Stack, or Splunk.
Experience
- 3+ years of experience in Site Reliability Engineering or DevOps roles.
- 2+ years of hands-on experience with Amazon Connect or similar CCaaS platforms.
- Experience supporting high-volume service desk or call center environments.
Education & Certifications
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- Preferred certifications:
- AWS Certified DevOps Engineer – Professional
- AWS Certified SysOps Administrator
Key Skills
Ranked by relevance
Related Jobs
3 roles aligned with this opportunity
Senior DevOps Engineer
2026-05-20
DevOps Engineer (AWS)
2026-05-27
DevOps Engineer
2026-05-27
- Posted
- May 06, 2026
- Type
- Contract
- Level
- Associate
- Location
- Singapore
- Company
- Elliott Moss Consulting
Industries
Categories
Related Jobs
3 roles aligned with this opportunity
Senior DevOps Engineer
2026-05-20
DevOps Engineer (AWS)
2026-05-27
DevOps Engineer
2026-05-27