Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
A Site Reliability Engineer (SRE) bridges the gap between software development and IT operations to ensure systems are reliable, scalable, and efficient. They apply software engineering principles to operational challenges, focusing on automation, monitoring, and performance optimization.
Key Objectives
- Maintain high availability and performance of production systems.
- Automate manual processes to improve efficiency and reduce human error.
- Monitor system health and proactively prevent incidents.
- Balance feature development speed with system reliability using SLIs, SLOs, and error budgets.
Core Responsibilities
- Run and monitor production environments, ensuring uptime and reliability.
- Build software and systems to manage infrastructure and applications.
- Partner with development teams for testing, release procedures, and capacity planning.
- Create sustainable systems through automation and continuous improvement.
- Respond to on-call incidents, troubleshoot issues, and implement fixes.
- Develop disaster recovery plans and ensure compliance with SLAs.
Required Skills & Qualifications
- Bachelor’s degree in Computer Science or related field.
- Strong programming skills in Python, Java, Go, or similar languages.
- Experience with cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes).
- Familiarity with CI/CD tools, monitoring systems (Prometheus, Grafana), and configuration management (Ansible, Terraform).
- Knowledge of distributed systems, networking, and storage technologies.
Preferred Attributes
- Problem-solving mindset with a focus on automation and scalability.
- Ability to work in cross-functional teams and communicate effectively.
- Experience with incident management and performance tuning.
Key Skills
Ranked by relevanceReady to apply?
Join N2S.Global and take your career to the next level!
Application takes less than 5 minutes

