-
Globant

Site Reliability Engineer

Globant
Australia · Full-time · Mid-Senior

At Globant, we are working to make the world a better place, one step at a time. We enhance business development and enterprise solutions to prepare them for a digital future. With a diverse and talented team present in more than 30 countries, we are strategic partners to leading global companies in their business process transformation.


We seek a Principal Site Reliability Engineer who shares our passion for innovation and change. This role is critical to helping our business partners evolve and adapt to consumers' personalized expectations in this new technological era.


Responsibilities

  • Design and implementation of software systems, preferably running at scale.
  • Experience leading a team preferable
  • Proven track record in managing complex, large-scale distributed systems across multiple layers: infrastructure, networks, applications, and data platforms.
  • Experience working in cloud-native environments (AWS, Azure, GCP) and hybrid cloud/on-prem setups.
  • Deep exposure to high-availability architectures, disaster recovery, and failover strategies.
  • Hands-on experience with CI/CD pipelines, Infrastructure as Code (IaC) tools, and automation frameworks.
  • Background in monitoring, observability, and performance optimization, using tools like Prometheus, Grafana, Datadog, New Relic, or Splunk.
  • Experience working with networking protocols (HTTP/S, TCP/IP, DNS, BGP, VPN) and strong understanding of application layer performance and security.
  • Exposure to AI/ML projects or MLOps pipelines is a plus (especially if aligned with data engineering or AI model reliability).
  • Prior experience in incident response, postmortem analysis, and continuous improvement cycles.
  • Familiarity with Agile, DevOps, and SRE principles, including error budgets and blameless postmortems.


Key Responsibilities

  • Take ownership of the full stack reliability — from infrastructure, networks, databases, middleware, to application performance.
  • Proactively identify, troubleshoot, and resolve complex system issues across multiple environments.
  • Design, implement, and maintain highly reliable and scalable architectures to meet critical business SLOs.
  • Develop and maintain robust monitoring and alerting systems to ensure fast detection and resolution of incidents.
  • Lead root cause analyses and post-incident reviews to improve system resilience.
  • Collaborate closely with development, QA, infrastructure, and product teams to embed reliability into the software lifecycle.
  • Automate repetitive tasks and operational workflows to improve system efficiency and team productivity.
  • Mentor and share knowledge with team members to foster a culture of continuous learning and curiosity.
  • Keep up to date with emerging technologies, propose improvements, and evaluate new tools or approaches that can enhance reliability, scalability, or performance.


Required Skills and Qualifications

  • Strong systems engineering background with a deep understanding of Linux/Unix internals and/or Windows systems.
  • Advanced knowledge of networking concepts: routing, load balancing, firewall management, DNS, VPN, SSL/TLS.
  • Proficiency in at least one programming/scripting language: Python, Go, Java, Bash, or similar.
  • Solid experience with cloud platforms (AWS, GCP, Azure), containers (Docker), and orchestration (Kubernetes).
  • Hands-on skills with monitoring/observability tools (Prometheus, Grafana, Datadog, New Relic, Splunk).
  • Familiarity with databases and storage systems (SQL, NoSQL, distributed storage).
  • Experience with IaC tools
  • Excellent problem-solving skills with a curious, investigative mindset — ability to dig deep into unknown issues across layers.
  • Strong communication and collaboration skills, able to work effectively across cross-functional teams.
  • Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).


Nice-to-Have Skills

  • Security and compliance awareness: Understanding of cloud security best practices, penetration testing, vulnerability management, and regulatory frameworks (e.g., PCI DSS, ISO 27001).
  • Performance tuning expertise: Ability to fine-tune application and database performance under heavy load, including JVM tuning, query optimization, and caching strategies.
  • Serverless architectures: Familiarity with serverless frameworks (AWS Lambda, Google Cloud Functions, Azure Functions) and event-driven design patterns.
  • Multi-cloud or hybrid cloud experience: Working across multiple cloud providers or integrating cloud with on-premises environments.
  • Certifications: AWS/GCP/Azure certifications, Kubernetes certification (CKA/CKAD), or SRE-focused credentials.


Preferred Soft Skills

  • Excellent communication skills, an expert at partnering with stakeholders
  • Highly curious, meticulous, and independent
  • Must have a solid and diverse engineering background

Key Skills

Ranked by relevance

cloud aws prometheus grafana datadog gcp vpn dns infrastructure as code penetration testing incident response cloud security kubernetes serverless firewall storage pci dss python docker devops nosql mlops java bash cicd sql jvm bgp dss ai
Login to Apply
Posted
May 30, 2025
Type
Full-time
Level
Mid-Senior
Location
Sydney
Company
Globant

Industries

IT Services IT Consulting

Categories

Engineering Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Egov Select
Related

Network and Systems Engineer

2026-05-28

Full-time
Not Applicable
Belgium
IT Services
Information Technology
View Job Details
Amazon Web Services (AWS)
Related

Network Development Engineer, Network Core

2026-05-27

Full-time
Not Applicable
Australia
IT Services
Quality Assurance
View Job Details
Globant
Related

Full Stack Developer

2026-05-22

Full-time
Not Applicable
Spain
IT Services
Engineering