Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Head of Infrastructure / Site Reliability Engineering (SRE)
Location: Galway or Remote (Ireland/UK)
Reports to: Chief Technology Officer (CTO)
Team: Infrastructure, DevOps, and SRE (initially 3–5 FTEs, growing as function matures)
Overview
OneTouch Health is transforming how care is delivered and coordinated across residential, domiciliary, and community settings. With multiple products, acquisitions, and a rapidly expanding user base, we are entering a pivotal stage of scaling and platform unification.
We are seeking a Head of Infrastructure / SRE to lead our transition to a modern, resilient, and automated infrastructure foundation. This role will oversee the insourcing of infrastructure management and monitoring, the build-out of centralized DevOps practices, and the architecture evolution toward Infrastructure-as-Code, blue-green deployment, and modular application design.
This is a senior hands-on leadership position responsible for reliability, performance, and security across all OneTouch products and environments.
Key Responsibilities
Infrastructure Strategy & Insourcing
- Lead the transition from outsourced hosting and network management to an in-house infrastructure function.
- Define and execute a roadmap for a scalable cloud-based environment (AWS focus) supporting multiple product lines.
- Establish and document infrastructure standards, including environment topology, IAM policies, and backup/disaster recovery.
- Implement Infrastructure-as-Code (IaC) practices across all environments using tools such as Terraform or Pulumi.
- Build internal monitoring, alerting, and observability capabilities to replace external dependencies.
DevOps Platform & Automation
- Design and roll out a centralized DevOps toolchain across the engineering organization (~40+ developers).
- Standardize CI/CD pipelines across PHP, Laravel, and frontend frameworks (Vue.js/React).
- Introduce blue-green and canary deployment models to enable safer releases and rapid rollback.
- Champion automation and self-service tooling to improve developer productivity and deployment velocity.
- Ensure consistent environment provisioning and dependency management across multiple products.
Reliability Engineering & Observability
- Establish a Site Reliability Engineering (SRE) discipline within OneTouch, embedding reliability metrics into engineering workflows.
- Define SLAs, SLOs, and SLIs across core services and ensure monitoring and alerting systems support them.
- Implement and maintain comprehensive observability using modern stacks (Prometheus, Grafana, Datadog, or ELK).
- Introduce structured incident management and post-mortem processes to drive learning and accountability.
Network, Security & Compliance
- Build a modern network security and monitoring capability, including WAFs, firewalls, and intrusion detection/prevention.
- Partner with the Security & Compliance team to maintain ISO 27001, NHS DSPT, and GDPR alignment.
- Lead implementation of secrets management, least-privilege IAM, and secure-by-design practices in infrastructure.
- Oversee vulnerability management, patching, and environment hardening across all systems.
- Support the transition to a unified SOC/NOC model for proactive monitoring of infrastructure and application health.
Leadership & Collaboration
- Build and lead a small but high-performing infrastructure and SRE team, including DevOps engineers and cloud specialists.
- Collaborate closely with Product Engineering, Security, and Data teams to embed reliability and automation across the SDLC.
- Provide clear reporting to the CTO and company leadership on infrastructure performance, availability, security posture, and cost efficiency.
- Foster a culture of automation, reliability, and continuous improvement across all engineering teams.
Candidate Profile
Essential Experience
- 8+ years’ experience in infrastructure, DevOps, or SRE roles, ideally in SaaS or healthcare technology.
- Proven experience leading infrastructure modernization or insourcing initiatives.
- Strong background in AWS, Linux systems, and network architecture.
- Expertise in Infrastructure-as-Code (Terraform, CloudFormation, or Pulumi).
- Practical experience with CI/CD pipelines and automated deployment of PHP applications.
- Familiarity with blue-green or canary deployment and zero-downtime upgrade strategies.
- Hands-on experience with observability tools (Prometheus, Grafana, Datadog, ELK, etc.).
- Understanding of security operations and compliance frameworks (ISO 27001, SOC 2, GDPR).
- Strong background and awareness of cybersecurity best practices and standards for SaaS / cloud.
- Excellent communication, documentation, and leadership skills.
Desirable
- Experience in healthcare SaaS or other regulated industries.
- Exposure to multi-tenant or modular SaaS architectures.
- Familiarity with containerization (Docker, ECS, or Kubernetes).
- Understanding of cost optimization / FinOps principles.
- Knowledge of modern modular code architectures and service decomposition strategies.
- Interest in emerging agentic AI or automation tools for infrastructure reliability and observability.
Success Criteria (First 12–18 Months)
- Full internal ownership of infrastructure and monitoring achieved with stable operations.
- IaC implemented across all staging and production environments.
- Blue-green or canary deployment live across major product lines.
- Unified DevOps toolchain adopted by all product teams.
- 99.9% uptime maintained across critical services.
- Measurable improvements in release frequency, MTTR, and incident response.
- Reduced infrastructure spend through modernization and automation.
Compensation & Benefits
- Competitive base salary commensurate with experience
- Flexible working arrangements (hybrid/remote)
- Pension, health, and wellbeing benefits
Key Skills
Ranked by relevanceReady to apply?
Join OneTouch and take your career to the next level!
Application takes less than 5 minutes

