-
Upper

Senior Site Reliability Engineer - Freelance

Upper
Romania · Full-time · Mid-Senior

Our client is looking for a highly skilled Senior Site Reliability Engineer (SRE) to serve as a hands-on reliability expert across three SaaS product lines. You’ll lead Tier-3 incident response, drive root-cause analysis, automate resilient infrastructure, and coach product teams in observability and SLO best practices. This is a high-impact, cross-functional role for someone passionate about performance, reliability, and DevOps culture.


Responsibilities:

  • Tier-3 incident response & root-cause analysis for all customer-facing products (GT Motiv, Contra Expert, Innovation Group).
  • Deep application debugging across Java, .NET, Go, Python stacks; correlate logs, traces, metrics (Datadog APM/Logs/RUM).
  • Network-level troubleshooting (TCP/IP, TLS, DNS, load-balancers, service mesh) to eliminate latency and availability bottlenecks.
  • Reliability engineering & automation: define/track SLOs & error budgets, build self-healing, fail-over, autoscaling and chaos-testing routines.
  • Observability platform ownership: create dashboards, alerting rules, and runbook automation; continuously close visibility gaps.
  • Post-incident improvement: facilitate blameless post-mortems, document findings, and drive architectural and process fixes.
  • Cross-functional coaching: embed with product squads to uplift logging, testing, and resilient design practices.


Requirements:

  • 8+ years in SRE / DevOps / production-engineering roles for high-availability SaaS.
  • Expert networking skills: packet-level analysis, transport protocols (TCP, TLS), HTTP & gRPC
  • Cloud proficiency in AWS, Azure or GCP, with experience in hybrid or multi-cloud topologies.
  • Coding ability: strong in Java and/or C# (.NET) plus one scripting language (Go, Python, Bash); able to debug unfamiliar codebases.
  • Observability & incident tooling: Datadog (preferred) or equivalent APM + log stack, plus PagerDuty/ServiceNow.
  • IaC & GitOps: Terraform , CICD, ArgoCD.
  • 24×7 on-call readiness and proven ownership of SLOs/SLA compliance.
  • Excellent written & spoken English (international stakeholder base).


Nice-to-have:

  • Experience running event-driven / streaming platforms (Kafka, RabbitMQ) and micro-services architectures.
  • Prior work in SRE consulting / “reliability guild” supporting multiple product lines.


If interested please apply here: https://app.upper.co/job/c029c993-5d9b-44fb-a968-b8c85c14c752?sourcerId=9a3ffee2-3acb-4f85-a458-9a7469aa02bf

Key Skills

Ranked by relevance

incident response datadog python devops java terraform rabbitmq kafka cloud bash saas cicd aws gcp dns c
Login to Apply
Posted
Jun 06, 2025
Type
Full-time
Level
Mid-Senior
Location
Romania
Company
Upper

Industries

IT Services IT Consulting

Categories

Engineering Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
Upper
Related

Senior Software Engineer

2025-03-03

Full-time
Mid-Senior
Germany
IT Services
Engineering
View Job Details
HERE Technologies
Related

Intern- Data Science

2026-05-28

Full-time
Not Applicable
Finland
IT Services
Engineering
View Job Details
Nokia
Related

AI Engineer Trainee

2026-05-28

Full-time
Not Applicable
Finland
IT Services
Engineering