Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
The role
We are seeking a skilled and motivated Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring our SaaS platform's reliability, scalability, and performance.
You will work closely with our development and operations teams to design, implement, and maintain the infrastructure and tools necessary to support our growing customer base.
Key Responsibilities
- Infrastructure Management: Design, build, and maintain the infrastructure that supports our SaaS platform, ensuring high availability and scalability.
- Monitoring and Alerting: Develop and implement monitoring and alerting systems to quickly detect and respond to incidents. Create dashboards to visualize system performance and identify potential issues.
- Incident Response: Lead incident response efforts, including root cause analysis, mitigation, and post-mortem reporting. Develop and maintain incident response playbooks.
- Automation: Automate repetitive tasks to improve efficiency and reduce human error. Implement infrastructure as code (IaC) using Terraform, or similar.
- Performance Optimization: Analyze system performance and identify areas for improvement. Work with development teams to optimize application performance and reliability.
- Security: Ensure the protection of our infrastructure and applications by implementing best practices and conducting regular security audits.
- Collaboration: Collaborate with development, operations, and product teams to design and implement reliable and scalable systems. Participate in architecture reviews and provide input on system design.
- Documentation: Create and maintain documentation for systems, processes, and procedures to ensure knowledge sharing and operational continuity.
Qualifications
- Education: Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
- Experience: 3+ years of experience in a Site Reliability Engineer or similar role, preferably in a SaaS environment.
Technical Skills
- Strong knowledge of infrastructure as code (IaC) tools such as Terraform.
- Proficiency in Google Cloud Platform (GCP).
- Experience with containerization and orchestration tools like Docker and Kubernetes.
- Experience with monitoring and logging tools such as Datadog, or similar.
- Familiarity with CI/CD pipelines and tools like Google Cloud Build, and GitHub Actions.
- Proficiency in scripting languages such as Python, Bash, or similar.
- Knowledge of Redis and MongoDB.
- Understanding of networking concepts and security best practices.
Soft Skills
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills.
- Ability to work independently and in a team environment.
- Attention to detail and a commitment to reliability and quality.
Key Skills
Ranked by relevanceReady to apply?
Join Empresa Confidencial and take your career to the next level!
Application takes less than 5 minutes

