-
Infoplus Technologies UK Limited

Senior Site Reliability Engineer

Infoplus Technologies UK Limited
Ireland · Contract · Mid-Senior

Job Description:

Must-Have Primary Skill: Solution Architecture-Tools-ITSM (Remedy, ServiceNow), Monitoring and Management (CA, HP, BMC), Automation (IPSoft)


The SRE & Observability Architect is a senior-level technical role responsible for defining, designing, and leading the implementation of scalable, enterprise-grade observability and reliability frameworks across modern, distributed applications. The ideal candidate will have deep, hands-on expertise in both Splunk Observability Cloud (SignalFx, APM, Log Observer) and Open Source telemetry stacks (OpenTelemetry, Prometheus, Grafana, Jaeger, etc.).

This architect must operate across strategy, design, implementation, and evangelism to build a reliable, proactive, and measurable ecosystem, ensuring engineering and operations teams have full visibility into the health, performance, and reliability of their services.

General Duties and Tasks

1. Architecture & Design

• Define end-to-end observability architecture covering metrics, logs, traces, and events (MELT)

• Create solution blueprints, reference architectures, and telemetry flow diagrams

• Architect telemetry pipelines using OpenTelemetry Collector, Splunk UF, Fluent Bit, etc.

• Establish observability data models, taxonomy, tagging standards, and metric frameworks

2. Tooling Strategy & Platform Oversight

• Lead integration and operationalization of Splunk Observability Cloud components:

• SignalFx for infra metrics

• Splunk APM and RUM for distributed tracing

• Log Observer for central logging

• Architect and manage OSS observability stacks:

• Prometheus, Grafana, Jaeger, Loki, Fluent Bit, OpenSearch

• Define tool governance standards to ensure consistency across services and environments

3. SRE Framework & Operational Readiness

• Define and institutionalize SRE principles including SLIs, SLOs, and error budgets

• Drive adoption of incident response, blameless postmortems, and runbook automation

• Enable alerting strategies and anomaly detection aligned with business impact

• Build SPoG (Single Pane of Glass) dashboards and reporting for stakeholders

4. Automation & CI/CD Integration

• Automate telemetry onboarding into CI/CD pipelines (e.g., Jenkins, ArgoCD, GitHub Actions)

• Implement observability-as-code using Terraform, Helm, or Ansible

• Integrate observability feedback loops into release pipelines and gating

5. Enablement & Evangelism

• Run workshops, discovery sessions, and assessments for teams adopting SRE & Observability

• Create and maintain best practices, templates, and internal documentation

• Coach teams on telemetry instrumentation, alert tuning, and reliability metrics

6. Governance & Data Strategy

• Define data retention, access control, and ingestion policies

• Monitor cost of observability data, optimize ingest volumes and noise reduction

• Align observability with enterprise security, compliance, and cloud-native standards

Key Skills

Ranked by relevance

splunk cloud prometheus grafana cicd incident response terraform jenkins loki
Login to Apply
Posted
Aug 29, 2025
Type
Contract
Level
Mid-Senior
Location
Dublin

Industries

IT Services IT Consulting Information Services Software Development

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Infoplus Technologies UK Limited
Related

Java FullStack Developer

2026-05-16

Contract
Mid-Senior
Ireland
IT Services
Information Technology
View Job Details
Fulcrum Digital Inc
Related

Senior Application Support Engineer - Java

2026-05-28

Full-time
Not Applicable
Ireland
IT Services
Information Technology
View Job Details
Infoplus Technologies UK Limited
Related

Java Software Engineer

2026-05-14

Contract
Mid-Senior
Ireland
IT Services
Information Technology