8282 - Site Reliability Engineer Cloud, Infrastructure and ITOps

unosquareArgentina11 hours ago

Full-timeEngineering, Information Technology

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Job Description

Client is revolutionizing the world of data management and analytics with our innovative cloud data platform, purpose-built for petabyte-scale datasets. Our mission is to help organizations drastically reduce data costs while increasing their data retention.

We are looking for a Site Reliability Engineer (SRE) to join our dynamic Services team. In this role, you will contribute to the reliability and scalability of our cutting-edge platform, ensuring exceptional solutions tailored to our customers’ unique needs. This is a highly technical, hands-on role that requires deep expertise in system reliability and automation.

Key Responsibilities

Infrastructure Reliability: Deploy, maintain, and ensure a highly reliable fleet of Kubernetes clusters and deployments across multiple cloud platforms.
Service Optimization: Design, implement, and maintain systems and processes to enhance the reliability, availability, and performance of our services.
CI/CD Management: Build and optimize CI/CD tools and processes to ensure efficient and reliable deployments.
Monitoring and Incident Response: Develop and manage monitoring, alerting, and incident response strategies to minimize downtime and enable rapid recovery.
Root Cause Analysis: Conduct comprehensive root cause analyses for system failures, implementing long-term preventive measures.
Automation and Efficiency: Automate repetitive tasks and optimize system performance to improve operational efficiency.
On-Call Support: Participate in covering weekday business hours and once-monthly weekend shifts.

Collaboration and Customer Engagement

Cross-Functional Teamwork: Work closely with software engineering, infrastructure, and product teams to integrate reliability practices into every stage of the development lifecycle.
Reliability Advocacy: Champion SRE best practices and foster a culture of operational excellence across the organization.
Global Team Collaboration: Collaborate with a distributed team of engineers worldwide to provide round-the-clock support.
Customer Support: Interface with customers to address and resolve reported incidents, ensuring a seamless user experience.

Qualifications And Skills

SRE Expertise: Proven experience as a Site Reliability Engineer or similar role, with a history of supporting complex distributed systems (minimum five years supporting complex distributed systems).
Observability Tools: Experience with monitoring and debugging tools like Prometheus, Vector, Grafana, Superset, or Kibana.
Cloud Platforms: Proficiency in at least one major cloud platform (AWS, GCP, Azure, or Linode).
Database Knowledge: Experience with SQL databases; familiarity with PostgreSQL is a plus but not required.
Programming Skills: Proficiency in programming languages such as Python, Go, or Rust.
Linux Expertise: Strong experience with Linux systems, including performance tuning and system-level troubleshooting.
Communication Skills: Excellent written and verbal communication skills, with the ability to convey technical concepts clearly to diverse audiences, including customers and cross-functional teams.

Key Skills

Ranked by relevance

Ready to apply?

Join unosquare and take your career to the next level!

Application takes less than 5 minutes

Apply