Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Site Reliability Engineer (Automation and DevOps)
Key Responsibilities
- Plan, manage, and oversee all aspects of a production environment
- Define strategies for application performance monitoring and optimisation in a production environment
- Respond to incidents
- Improvise platform based on feedback and measure the reduction of incidents over time
- Support deployment of code into multiple lower environments
- Support current processes with an emphasis on automating everything as soon as possible
- Design, develop and standardise a monitoring and alerting mechanism for the supported applications
- Take a holistic approach to problem-solving, by connecting the dots during a production event through the various technology stack that makes up the platform, to optimising meantime to recover
- Engage in and improve the whole lifecycle of services - from inception and design, through deployment, operation and refinement
- Analyse ITSM activities of the platform and provide feedback loop to Development teams on operational gaps or resiliency concerns
- Support services before they go live through activities such as system design consulting, capacity planning and launch reviews
- Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead in DevOps automation and best practices
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health
- Scale systems sustainably through mechanisms like automation and evolving systems by pushing for changes that improve reliability and velocity
- Work with a global team spread across tech hubs in multiple geographies and time zones
- Ability to share knowledge and explain processes and procedures to others
- Share knowledge and mentor Junior resources
- Ability to perform on-call duties on a rotational basis
- Occasional off-hours work required
Skills Required
- Linux
- Mainframe
- Shell scripting
- ITIL / ITSM
- Application troubleshooting
- SQL
- Any monitoring tool (Splunk / Dynatrace preferred)
- Jenkins - CI/CD
- Groovy scripting / YAML (basic)
- Git (basic) / Bitbucket (basic)
- Ansible / Chef
- Event framework architecture
Key Skills
Ranked by relevanceReady to apply?
Join Tiger Resourcing Group and take your career to the next level!
Application takes less than 5 minutes

