DevOps Team Lead

Runware

United Kingdom · Full-time · Mid-Senior

Runware's infrastructure is the foundation that enables our teams to deliver AI to the world. As DevOps Team Lead, you'll turn complex, hardware-driven systems into streamlined, developer-friendly platforms.

You'll define how we automate deployments, orchestrate GPUs at scale, and observe workloads in real time. You'll build systems that detect and recover from issues before users notice, and work closely with engineering and product teams to make shipping faster, safer, and more predictable.

You'll shape the foundation that lets teams move fast with confidence, building infrastructure that is dependable, observable, and designed to scale.

Is this role a fit for you?

You thrive at the intersection of infrastructure and innovation. You enjoy unravelling complex systems, tuning performance, and engineering reliability into everything you build. You lead through clarity and example, not process, and elevate the teams around you by simplifying the hard things.

You take pride in building systems that are resilient by design and empowering the engineers who depend on them. You understand that reliability is never accidental; it is built through intent, consistency, and a culture that values doing things right.

What This Role Will Entail

Providing technical and people leadership to a small DevOps team
Lead the design and operation of Runware's infrastructure and orchestration systems
Build automation and tooling to streamline model deployments, scaling, and hardware utilisation across distributed nodes
Drive observability, alerting, and reliability practices to detect and resolve issues quickly and proactively
Collaborate with engineers to optimise throughput, latency, and platform performance at every layer of the stack
Develop and maintain infrastructure as code and deployment automation to ensure consistency and reproducibility across environments
Establish and continuously evolve incident management, post-mortems, and reliability reviews as core engineering practices
Mentor and coach engineers to think operationally, designing systems that fail gracefully and scale predictably
Champion forward-looking improvements to our orchestration layer, hardware management, and overall infrastructure efficiency

Requirements

Have experience operating production systems on bare metal or hybrid environments such as HPC or GPU clusters, optimised for performance and low latency
Are comfortable writing automation and systems tooling in Python, Go, or similar languages
Understand container runtimes like Docker and containerd, and have built or worked with orchestration systems beyond Kubernetes
Are fluent in observability and debugging practices across distributed systems, using logs, metrics, traces, and profiling to drive insight and reliability
Care deeply about reliability, efficiency, and engineering quality, and know how to embed those values into team culture and everyday practice
Thrive in fast-moving, evolving environments where impact is measured by how much better systems and teams perform over time

Benefits

We're a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.

Our release cycles are fast and intense, but they're followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.

Generous paid time off - vacation, sick days, public holidays
Meaningful stock options - share in the upside you create
Remote-first setup - work from home anywhere we can employ you
Flexible hours - own your schedule outside core collaboration blocks
Family leave - paid maternity, paternity, and caregiver time
Company retreats - twice-yearly gatherings in inspiring locations

Please note: We are unable to offer visa sponsorship in the UK at this time. Candidates must have existing right to work in the UK.

Key Skills

Ranked by relevance

devops infrastructure as code deployment automation python docker ai

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Software Engineer

2026-05-27

Full-time

Mid-Senior

United Kingdom

IT Services

Engineering

Software Engineer - Human Data Platforms (Remote)

2026-07-09

Full-time

Associate

United Arab Emirates

Software Development

Engineering

Senior PHP Developer

2026-01-12

Full-time

Mid-Senior

United Kingdom

IT Services

Engineering

🇬🇧

Country Guide

United Kingdom

Global English-speaking job market

Posted: Oct 30, 2025
Type: Full-time
Level: Mid-Senior
Location: United Kingdom
Company: Runware

Industries

IT Services IT Consulting

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior Software Engineer

2026-05-27

Full-time

Mid-Senior

United Kingdom

IT Services

Engineering

Software Engineer - Human Data Platforms (Remote)

2026-07-09

Full-time

Associate

United Arab Emirates

Software Development

Engineering

Senior PHP Developer

2026-01-12

Full-time

Mid-Senior

United Kingdom

IT Services

Engineering

DevOps Team Lead

Key Skills

Related Jobs

Senior Software Engineer

Software Engineer - Human Data Platforms (Remote)

Senior PHP Developer

Related Jobs

Senior Software Engineer

Software Engineer - Human Data Platforms (Remote)

Senior PHP Developer

Cookie Settings