Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
1 year renewable contract
As a Site Reliability Engineer you will be filling a mission-critical role ensuring that our systems are healthy, monitored, automated, fault tolerant and designed to scale.
You will collaborate and work closely with engineering teams to continually improve our production services, facilitating fast delivery of new products, and reducing downtime.
Key Responsibilities:
- Drive Site Reliability Engineering agenda to improve availability, reliability, and performance of services
- Drive observability for our applications.
- Drive optimise-operate initiative, example, reduction of operation toil
- Work with application teams in setting up SLI, SLO and Error budget for their applications
- Work with enterprise team in deploying SRE enablers/initiatives.
Requirements:
- Have a good understanding of ITIL & SRE processes & practices
- Have good leadership skills in working with application teams and service providers in defining infrastructure deployment plan, cutover/migration strategy and test plan.
- Able to formulae and establish infrastructure deployment standards.
- Good people management, vendor management and project management skills
- Agile, AWS certification preferred
- Able to create Bash/Python scripts for infra deployment
- Must able to practice SRE & Chaos Engineering principles
- Understands key SRE concepts such as Toil, SLI, SLO, Error Budgets, MTTD, MTTR, etc
- Strong, committed, and reliable team player, able to take direction but also willing to contribute to discussions on design and strategy.
- Possess strong interpersonal and communication skills to be able to deal with and form good relationships with other technology teams through day to day support and project work
Key Skills
Ranked by relevanceReady to apply?
Join NTT DATA, Inc. and take your career to the next level!
Application takes less than 5 minutes