Sedha Consulting
Technical Support Specialist
Sedha ConsultingSingapore4 days ago
ContractInformation Technology

About the Role

We are looking for a skilled and driven Technical Software/Support Engineer (Operations) to join our team. In this role, you will drive our operations and incident management initiatives, ensuring our systems remain robust, scalable, and resilient at scale. You will work closely with cross-functional teams to identify operational gaps and implement solutions that enable seamless deployment, observability, and maintenance of our system


Key Responsibilities


Incident Management & Response (60%)

● Lead/contribute to incident response efforts during critical system outages and performance degradations

● Develop and maintain incident response procedures, runbooks, and escalation protocols

● Conduct thorough post-incident reviews and drive implementation of preventive measures

● Coordinate cross-functional teams during high-severity incidents

● Build and maintain incident management tooling and automation

● Manage stakeholders expectations


System Operations & Reliability (20%)

● Design, implement, and maintain monitoring, alerting, and observability across our system

● Develop automation tools to reduce manual operational overhead

● Ensure system SLAs and SLOs are met consistently


Software Development (10%)

● Build internal tools, APIs, and platforms to improve operational efficiency

● Create dashboards and reporting systems for operational metrics


Collaboration & Process Improvement (10%)

● Partner with development teams to improve system reliability and operability

● Establish and refine operational processes and best practices

● Mentor team members on incident response and operational procedures

● Participate in on-call rotation and provide operational leadership during incidents

● Drive continuous improvement initiatives based on operational data and feedback


Required Qualifications


Technical Skills

● 5+ years of software engineering experience with a focus on operations

● Proficiency in at least one programming language (Python, Java/Kotlin, TypeScript or similar)

● Experience in modern web application technologies/tools such as PostgresDB, Kotlin, AWS

● Knowledge of CI/CD pipelines and deployment automation

● Experience with AWS and container technologies (Docker, Kubernetes)

● Understanding of monitoring and observability tools (Prometheus, Grafana, ELK stack, or similar)

● Experience with APM tools (New Relic, Datadog, AppDynamics)

● Experience with infrastructure-as-code tools (Terraform, Ansible, CloudFormation)

● Background in DevOps or Site Reliability Engineering practices

● Experience with log aggregation and analysis tools

● Understanding of security operations and compliance requirements

● Contribute to system architecture decisions with operations considerations in mind


Operational Experience

● Proven experience in incident management and response procedures

● Experience with on-call responsibilities and escalation processes

● Understanding of system reliability concepts (SLAs, SLOs)

● Knowledge of networking, security, and database administration concepts

● Experience with configuration management and deployment strategies


Soft Skills

● Excellent problem-solving and analytical thinking abilities

● Strong communication skills for technical and non-technical audiences

● Ability to work effectively under pressure during incident situations

● Collaborative mindset with cross-functional teams

● Detail-oriented approach to documentation and process improvement

Key Skills

Ranked by relevance