Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
JOB DESCRIPTION
Qualifications
- The ideal candidate will have a strong background in production monitoring, a deep understanding of development and operations, and a proven track record in managing and scaling distributed systems in a public, private, or hybrid cloud environment for e-commerce / retail platforms.
- Understanding of SRE & DevOps principles, including monitoring, alerting, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements.
- Expertise in infrastructure as code (IAC), build automation, source control, and CI/CD tools (e.g., Terraform, CloudFormation, GitHub, Artifactory, Jenkins).
- Deep understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Splunk, New Relic, Sumo Logic) and incident response processes.
- Proficient in modern Java, React, NodeJS, and scripting languages such as Python, and Bash.
- High-level understanding of the different layers of the Tech stack and how they come together to provide a service (e.g. network, compute, storage, OS (Linux, Windows), supporting services, application layer).
Responsibilities
- Key measures of success will include platform stability, effective integration and delivery, instrumentation, release quality, technical debt(toil) reduction, development of automation, risk/security compliance, and sustained advancement of the SRE & DevOps practice.
- Design & implement scalable, automated, monitored, and well-documented systems to accelerate the development of the services running in the AWS cloud.
- Configure, tune, and fix multi-tiered systems to achieve optimal application performance, stability, and availability.
- Be part of an on-call rotation providing hands-on technical expertise during service-impacting events.
- Apply troubleshooting skills, debugging tools, and examine logs, telemetry, and other methods to verify assumptions and customer impact. Lead blameless postmortems for root cause and production resiliency.
Key Skills
Ranked by relevanceReady to apply?
Join Zensar Technologies and take your career to the next level!
Application takes less than 5 minutes

