N2S.Global
Site Reliability Engineer
N2S.GlobalAustralia1 day ago
Full-timeInformation Technology

Overview

A Site Reliability Engineer ensures the reliability, scalability, and performance of systems and services. They bridge the gap between development and operations by applying software engineering principles to infrastructure and operations problems.

Key Responsibilities

  • System Reliability & PerformanceDesign, build, and maintain scalable and highly available systems.
  • Monitor system health and performance using observability tools.
  • Incident ManagementRespond to production incidents, perform root cause analysis, and implement preventive measures.
  • AutomationDevelop scripts and tools to automate repetitive tasks and improve efficiency.
  • Capacity PlanningForecast system demands and plan for scaling infrastructure.
  • CollaborationWork closely with development teams to ensure reliability is built into applications.
  • Security & ComplianceImplement best practices for system security and compliance.

Required Skills

  • Strong knowledge of Linux/Unix systems and networking fundamentals.
  • Proficiency in programming/scripting languages (Python, Go, Bash).
  • Experience with cloud platforms (AWS, Azure, GCP).
  • Familiarity with CI/CD pipelines and DevOps practices.
  • Expertise in monitoring tools (Prometheus, Grafana, ELK stack).
  • Understanding of containerization and orchestration (Docker, Kubernetes).

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or related field.
  • 3+ years of experience in system administration, DevOps, or SRE roles.
  • Strong problem-solving and troubleshooting skills.

Preferred

  • Experience with Infrastructure as Code (Terraform, Ansible).
  • Knowledge of distributed systems and microservices architecture.

Key Skills

Ranked by relevance