intive
Site Reliability Engineer
intiveRomania3 days ago
Full-timeEngineering, Information Technology
Join a diverse team of approx. 2000 professionals across four continents, driving innovation and growth within intive’s technology hubs. Work alongside industry experts trusted by leading brands like Audi, BMW, Deichmann, Meta, NewsCorp, Tandem, Paramount, Vorwerk, and Warner Bros. Discovery to create pioneering, sustainable digital experiences.

At intive, agile thinking and deep industry expertise come together across Automotive & Mobility, Commerce, Financial Services, Healthcare & Life Sciences, and Technology, Media & Communication. Be part of a team that’s shaping the future of digital innovation.

We’re looking for a Site Reliability Engineering to drive the stability, scalability, and security of our digital sports streaming platform.

This Is a Hands-on Leadership Role Where You’ll

  • Ensure the reliability of our AWS-based infrastructure
  • Strengthen observability, automation, and security
  • Support high-performance systems that power live and on-demand video streaming

The ideal candidate combines expertise in site reliability, automation, and security with a strong background in digital video streaming. You’ll work across teams to resolve incidents, build resilient systems, and enable continuous innovation. Please take into account the time zone: EST.

What You Will Be Doing

  • Take ownership of platform reliability, performance, and security
  • Lead and mentor a small technical team while remaining a hands-on contributor
  • Build and maintain monitoring, logging, and alerting systems for visibility and rapid response
  • Define and enforce best practices in disaster recovery, redundancy, and failover strategies
  • Troubleshoot complex issues across infrastructure, APIs, video delivery, and playback
  • Lead incident response efforts and participate in on-call rotations during peak traffic (typically evenings EST)
  • Partner with Product and Engineering to guide architectural decisions around resilience, scalability, and security
  • Collaborate with Operations and Customer Care to resolve incidents and eliminate recurring issues
  • Oversee platform security practices, including IAM, secrets management, and AWS hardening
  • Research and adopt new tools and technologies to improve reliability
  • Track and optimize SLAs, SLOs, and KPIs for uptime, latency, playback quality, and security

You Are a Good Match If You Have

  • 7+ years in SRE, DevOps, or infrastructure roles
  • Proven experience running and scaling production systems in AWS (CloudFront, Lambda, S3, API Gateway, CloudWatch, etc.)
  • AWS certification (Solutions Architect, DevOps Engineer, or equivalent hands-on expertise)
  • Strong background in observability (Datadog, CloudWatch, Conviva, etc.)
  • Skilled in scripting/automation (Python, Bash) and infrastructure-as-code (Terraform, CloudFormation)
  • Experience leading security initiatives (IAM, token management, service hardening)
  • Solid understanding of video streaming technologies (HLS/DASH, CDNs, DRM, SSAI, multi-platform delivery)
  • Experience improving CI/CD pipelines and supporting safe production releases
  • Strong problem-solving skills across application, network, and video delivery layers
  • Excellent communication and collaboration skills, including vendor management

Nice To Have

  • Leadership capacity

Key Skills

Ranked by relevance