Founding Site Reliability Engineer (SRE)

GizmoUnited Kingdom19 hours ago

Full-timeRemote FriendlyEngineering

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Gizmo is an AI startup on a mission to make learning so easy that anyone can learn anything. We're building Duolingo for anything - a platform that uses gamification and social mechanics to make learning fun.

With over 1 million monthly active users and $4M in annual recurring revenue, we're already one of the fastest-growing startups in the UK. Backed by leading investors, we recently raised $22M in Series A funding to accelerate our vision of helping 1 billion people learn.

Role Overview

You will be our founding SRE. Reporting to the CTO, you will own capacity, performance and reliability for Gizmo's full-stack platform as daily traffic climbs from hundreds of thousands to millions of users. You'll write code across the stack, but your charter is classic SRE: defend SLOs, eliminate toil, and raise the ceiling on scale before it becomes a hard limit.

Key Responsibilities

Define SLIs/SLOs for latency, availability and error rate; codify error budgets and partner with product teams on trade-offs

Perform load-testing, capacity modelling and up-front scalability design for PostgreSQL, OpenSearch, Redis, Hasura and CF Workers; produce data-driven scaling plans.

Extend metrics, structured logging and tracing; establish alert rules that page only on user-visible impact; build actionable runbooks.

Join the on-call rotation, lead blameless post-mortems, drive remediation work to closure and track MTTR/MTBF improvements

Automate repetitive ops on Kubernetes and CI/CD; keep "toil"

Coach full-stack engineers on query optimisation, schema design and back-pressure techniques; document patterns and anti-patterns by creating an SRE playbook

Requirements

Hands-on scale experience: you have run relational stores at 100 k+ TPS or 1 M+ concurrent users (e.g., multi-tenant PostgreSQL, sharded MySQL).
You have software engineering experience
Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs.
Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts.
Comfort with Kubernetes, IaC and cloud-native patterns; can debug from network to application layer.
Self-starter with a maker mindset. We're looking for ex-founders or individuals with start-up experience.
Start-up bias for action: you prioritise high-leverage fixes, ship iteratively and own outcomes end-to-end.
Collaborative and feedback-driven; you welcome post-mortem culture and continuous improvement.
Driven by impact - you prioritise work that moves the needle!

Nice-to-haves: experience with Hasura internals, Cloudflare Workers edge optimisation, or operating OpenSearch at scale.

Benefits

Highly competitive salary
You'll own a piece of what you're building - equity included
Hybrid working model with 4 days in our East London office, ideally located between Shoreditch High Street, Old Street, and Liverpool Street stations
The opportunity to become one of the earliest employees in one of the UK's fastest-growing startups
Private health insurance

Key Skills

Ranked by relevance

Ready to apply?

Join Gizmo and take your career to the next level!

Application takes less than 5 minutes

Apply