Solas IT Recruitment
Site Reliability Engineer
Solas IT RecruitmentIreland12 days ago
Full-timeInformation Technology

My client is seeking a Site Reliability Engineer (SRE) to join their platform engineering team in Limerick. In this role, you will design, build, and improve the reliability, performance, and scalability of production systems. You will apply software engineering principles to operations, focusing on automation, observability, incident response, and service resilience.


Key Responsibilities

  • Own and improve service reliability, performance, and uptime through SRE best practices.
  • Define and track SLIs, SLOs, and error budgets to guide engineering decisions.
  • Build automation to eliminate manual toil across deployments, operations, and incident response.
  • Design, implement, and scale highly available production systems across cloud and on-prem environments.
  • Troubleshoot complex issues across applications, infrastructure, networks, and load balancers.
  • Develop and enhance observability tooling (metrics, logs, tracing, alerting).
  • Improve CI/CD pipelines, release processes, and production readiness.
  • Collaborate with software engineering teams to ensure systems are resilient, secure, and scalable.


Skills & Experience Required

  • 5+ years in SRE, DevOps, Platform Engineering, or similar reliability-focused role.
  • Strong background with cloud platforms (AWS, Azure, or GCP).
  • Proven experience with automation and scripting (Python, Bash, Go, PowerShell, etc.).
  • Solid understanding of Linux systems and distributed system fundamentals.
  • Strong experience with load balancers (NGINX, HAProxy, F5, or cloud-native LB).
  • Hands-on knowledge of CI/CD and deployment pipelines.
  • Experience with observability stacks (Prometheus, Grafana, ELK, Datadog, CloudWatch, etc.).
  • Strong networking fundamentals: DNS, TCP/IP, routing, firewalls.
  • Experience with containers and orchestration (Docker, Kubernetes a plus).


Nice to Have

  • Experience with Infrastructure as Code (Terraform, CloudFormation, Ansible).
  • Familiarity with distributed tracing (OpenTelemetry, Jaeger).
  • Previous experience working at scale with high availability requirements.
  • Software engineering experience in any major language.

Key Skills

Ranked by relevance