N2S.Global
Site Reliability Engineer
N2S.GlobalAustralia10 hours ago
Full-timeInformation Technology

We are looking for a Site Reliability Engineer (SRE) to join our team and ensure the reliability, scalability, and performance of our software systems. This role bridges the gap between software development and IT operations, focusing on automation, monitoring, and incident response to maintain high system uptime and user satisfaction.

Key Responsibilities

  • Monitor system performance and availability using tools like Prometheus, Grafana, and ELK stack.
  • Build and maintain scalable infrastructure using tools such as Terraform, Ansible, and Kubernetes.
  • Automate operational tasks and deployment pipelines (CI/CD).
  • Collaborate with development teams to improve system reliability and performance.
  • Participate in incident response, root cause analysis, and postmortem documentation.
  • Define and maintain service-level objectives (SLOs) and service-level indicators (SLIs).
  • Implement disaster recovery and business continuity plans.
  • Optimize system performance and resource utilization.

Required Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or related field.
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering.
  • Proficiency in programming languages such as Python, Go, Java, or Ruby.
  • Strong understanding of Linux systems and networking fundamentals.
  • Experience with cloud platforms (AWS, Azure, GCP).
  • Familiarity with containerization and orchestration (Docker, Kubernetes).
  • Knowledge of monitoring and alerting tools.
  • Excellent problem-solving and communication skills.

Preferred Qualifications

  • Experience with distributed systems and microservices architecture.
  • Certifications in cloud technologies (e.g., AWS Certified Solutions Architect).
  • Experience with security and compliance in production environments.

Key Skills

Ranked by relevance