Site Reliability Engineer

Elwood Roberts - Your Trusted Recruitment Partner

Ireland · Contract · Mid-Senior

Job: site Reliability Engineer

Location: North Dublin

Rate: €450-500 per day

Type: Contract (12 months+)

Working arrangement: Hybrid (1-2 days onsite per week)

We have an excellent role for a combined Site Reliability Engineering (SRE) and Observability Engineering role to oversee and ensuring that complex software systems are reliable, scalable, making sure the systems transparent is measurable through advanced monitoring and data analysis. The role is based with a leading business within the aviation sector.

Some of the Responsibilities will include:

Work within the technology team to implement the SRE strategy and roadmap of best practices.
Monitor capacity availability and system health of production environments.
Improve reliability, quality, and time-to-market for all applications
Build systems and introduce tooling to manage applications and infrastructure
Design, implement, and maintain the observability platform (metrics, logging, tracing, and alerting) to support developers and operators.
Own and evolve tooling and infrastructure related to Prometheus, Grafana, Open Telemetry or similar observability tools.
Collaborate with SREs, developers, and infrastructure teams to define SLOs, SLAs, and SLIs and ensure proper instrumentation of services.
Improve system performance and reliability through monitoring insights and proactive detection of anomalies.
Develop automation and tooling to reduce manual effort and improve response times for incidents.
Enable development teams with self-service dashboards and alerting configurations.
Participate in on-call rotations and incident response processes to identify and address observability gaps.
Offer primary operational support for distributed software applications
Work with development team and build process to ensure tools and automation is in place to ensure reliable and quality code deployments.
Work with development teams to test and improve services
Gather and analyse data from operating systems to troubleshoot and fine-tune performance
Measure and optimize system performance
Understand the metrics and ensure the quality of the systems
Contribute to platform management, capacity planning, design consulting, service level objective (SLOs) establishment
Help improve process for making sure that on-call requests are managed efficiently without taking a hit at the reliability of the system.
Optimizing on-call incident management through automated tools and software.
Act as the storehouse of the information regarding the process and system in the team. Responsible for directing the issue to the right person so that quick action can be taken, and the downtime can be reduced for the system.
Using trend analysis to review potential process challenges as well as the infrastructure and operations.
Responsible for the documentation of all that knowledge.
Provide regular updates on health and performance of production systems.
Support a CI/CD framework through engagement with engineering teams.
Supporting delivery and operations where necessary
Champion continuous improvement
Use automation to create sustainable services
Maintain continuity, capacity, and compliance.

We are looking for people with the below skill set:

5+ years hands-on experience with cloud technology as infrastructure or application developer.
2-3 years’ experience as Site Reliability Engineer
Strong experience with observability tools such as Prometheus, Grafana, Loki, Tempo, Elastic Stack, OpenTelemetry, or commercial solutions like Datadog, New Relic, or Splunk.
Solid understanding of distributed systems, microservices, and cloud-native architectures.
Proficiency with infrastructure-as-code tools (e.g., Terraform, Ansible, Helm) and container orchestration systems (e.g., Kubernetes).
Experience with scripting and programming languages (e.g., Python, Go, Bash).
Communication Skills: Effectively shares information and asks clarifying questions.
Collaboration: Works well within a team, open to feedback, and engages cross-functionally.
Strategic Thinking: Thinks ahead and plans for long-term reliability, scalability, and impact.

Key Skills

Ranked by relevance

prometheus grafana cloud incident response microservices data analysis terraform ansible datadog python loki cicd

Related Jobs

3 roles aligned with this opportunity

View all jobs

Junior AWS Devops Engineer

2026-07-09

Full-time

Associate

United Kingdom

Technology

Information Technology

DevOps Engineer

2026-07-18

Full-time

Mid-Senior

United Kingdom

Technology

Information Technology

Software Engineer

2026-07-09

Full-time

Not Applicable

Australia

Technology

Engineering

🇮🇪

Country Guide

Ireland

English-speaking EU tech hub

Posted: Jun 13, 2025
Type: Contract
Level: Mid-Senior
Location: Dublin
Company: Elwood Roberts - Your Trusted Recruitment Partner

Industries

Technology Information Media

Related Jobs

3 roles aligned with this opportunity

View all jobs

Junior AWS Devops Engineer

2026-07-09

Full-time

Associate

United Kingdom

Technology

Information Technology

DevOps Engineer

2026-07-18

Full-time

Mid-Senior

United Kingdom

Technology

Information Technology

Software Engineer

2026-07-09

Full-time

Not Applicable

Australia

Technology

Engineering

Site Reliability Engineer

Key Skills

Related Jobs

Junior AWS Devops Engineer

DevOps Engineer

Software Engineer

Related Jobs

Junior AWS Devops Engineer

DevOps Engineer

Software Engineer

Cookie Settings