Engage People Recruitment
Site Reliability Engineer (AWS)
Engage People RecruitmentIreland16 hours ago
Full-timeRemote FriendlyInformation Technology

Job Title: Site Reliability Engineer (AWS)

Location: Dublin / Hybrid (2 days)

Type: Permanent


We’re looking for a Site Reliability Engineer (Mid-Level) who loves solving complex problems, automating everything possible, and keeping systems running smoothly. You’ll be right in the mix of building reliable, scalable, and high-performing cloud environments — mainly in AWS — while helping our dev teams ship great products faster and with fewer headaches.

This is a hands-on role for someone who values automation, ownership, and collaboration over silos and manual fixes. You’ll also join our on-call rotation (don’t worry, it’s shared and well-supported).


What you’ll be doing

  • Build and manage robust, highly available AWS infrastructure using tools like Terraform or CloudFormation
  • Maintain and improve CI/CD pipelines (Azure DevOps) for automated deployments and testing
  • Work with Docker and Kubernetes (EKS/ECS) to orchestrate containerized workloads
  • Automate as much as possible — from monitoring and alerting to deployment workflows
  • Define and track reliability metrics (SLIs/SLOs/error budgets)
  • Dive into incidents, lead root cause analysis, and make sure they don’t happen again
  • Build out observability solutions (CloudWatch, Prometheus, Grafana, ELK, etc.)
  • Partner with development and security teams to improve app reliability and platform performance
  • Keep security tight — IAM roles, secrets management, and network boundaries are second nature


What makes you a great fit

  • Around 5–7 years in IT, with 3+ years in SRE, DevOps, or Cloud Engineering
  • Deep hands-on experience with AWS (EC2, VPCs, IAM, S3, RDS, CloudWatch, ALB/ELB, Route53)
  • Solid experience building CI/CD pipelines with Azure DevOps
  • Comfortable managing Linux and/or Windows environments at scale
  • Strong background in Docker and Kubernetes — you know your way around clusters, scaling, and deployments
  • Skilled with Infrastructure as Code (Terraform, CloudFormation)
  • Confident scripting in Bash or Python for automating all the boring stuff
  • Experienced in monitoring, logging, and alerting — you believe in metrics, not guesswork
  • Understand the core of SRE: SLIs/SLOs, incident management, postmortems, capacity planning
  • Always exploring ways to make cloud systems faster, more resilient, and more cost-efficient


Who you are

  • Obsessed with uptime, reliability, and automation
  • Open communicator who thrives in cross-team collaboration
  • Takes full ownership of what you build and run
  • Always curious about the latest in SRE and cloud-native tech

Key Skills

Ranked by relevance