Eneba
Site Reliability Engineer
EnebaLithuania7 hours ago
Full-timeRemote FriendlyEngineering, Information Technology
About Eneba

At Eneba, we’re building an open, safe and sustainable marketplace for the gamers of today and tomorrow. Our marketplace supports close to 20m+ active users (and growing fast!), provides a level of trust, safety and market accessibility unparalleled to none. We’re proud of what we’ve accomplished in such a short time and look forward to sharing this journey with you. Join us as we continue to scale, diversify our portfolio, and grow with the evolving community of gamers.

About Your Team

The Platform team builds, deploys, monitors, and is on call for the platform components and underlying platform infrastructure. The platform team creates tools for other teams to perform in the most stable, fast, and precise manner. Platform team members do not shy away from architecture-level assignments. They follow the latest tech trends, pulse, and know about the most effective tools of the moment. Eneba’s users cannot visually see the impact of the platform team, however, it is felt via the presence of speed, quality, and new features.

We’re expanding the team with a dedicated Site Reliability Engineer who will take ownership of observability, reliability practices, and system visibility across a highly distributed environment.

As a Site Reliability Engineer, you will own and evolve the entire observability and reliability layer of our platform. You’ll improve our metrics, logs, and tracing ecosystem; guide teams in building reliable services; introduce SLOs and error budgets; implement production readiness processes; and support developers during incidents by helping identify failing components across distributed systems.

You will be the driving force behind making reliability and observability a first-class part of our platform and self-service workflows.

Responsibilities

  • Own and evolve our observability stack across metrics, logs, and tracing using Prometheus CRDs, Thanos, Alertmanager, Loki, Sentry, Grafana, and supporting AWS services.
  • Improve system reliability by designing, implementing, and maintaining SLIs, SLOs, and error budgets, ensuring our services meet reliability objectives.
  • Enhance system visibility, enabling teams to proactively detect issues, reduce MTTR, and improve incident response workflows.
  • Build internal self-service capabilities for metrics, alerts, dashboards, and instrumentation to empower development teams.
  • Tune and optimize the Thanos stack, improving query performance, cache effectiveness, retention policies, and cost efficiency.
  • Extend and maintain monitoring Helm charts, Prometheus rules, exporters, and dashboards-as-code.Collaborate with Backend, DevOps, and Platform teams to ensure reliability and observability are built into services from the design phase.
  • Support incident investigations, help pinpoint root causes, correlate metrics/logs/traces, and contribute to blameless postmortems.
  • Maintain observability cost efficiency, reducing waste through retention strategy, metric cardinality tuning, and performance improvements.
  • Keep the monitoring stack healthy and up to date, ensuring reliability, security, and alignment with best practices.

Requirements

  • Hands-on experience with production observability systems, especially Prometheus, Alertmanager, Grafana, and log/trace platforms like Elasticsearch, Loki, Sentry, or their equivalents.
  • Experience with Thanos or large-scale metrics systems, including tuning, caching strategies, and long-term storage.
  • Strong understanding of SLIs, SLOs, error budgets, MTTR, reliability patterns, and incident response workflows.
  • Solid experience with Kubernetes in production and deep understanding of how to monitor it (exporters, node metrics, service mesh signals).Proficiency with Infrastructure as Code (Terraform preferred) and automation best practices.
  • Experience with AWS monitoring, scaling, and distributed cloud resource observability.
  • Proficiency in scripting or programming (Go, Python, or Bash) to build automation and tooling.
  • Ability to reason about distributed systems failures, correlate signals, and guide teams through root-cause analysis.
  • Strong ownership mindset, excellent communication, and eagerness to collaborate with development teams.

Extra points

  • Experience designing, tuning, or operating Thanos at scale.
  • Experience building self-service observability tooling or dashboards-as-code frameworks.
  • Deep understanding of alert fatigue reduction, signal-to-noise optimization, and high-quality alerting patterns.
  • Experience implementing resilience testing, fault injection, or chaos engineering.
  • Familiarity with service meshes (Istio, Linkerd) or service-level reliability patterns (circuit breakers, retries, rate limiting).
  • Background operating multi-region or global-scale systems with complex telemetry needs.

What It’s Like To Work At Eneba

  • Opportunity to join our Employee Stock Options program.
  • Opportunity to help scale a unique product.
  • Various bonus systems: performance-based, referral, additional paid leave, personal learning budget.
  • Paid volunteering opportunities.
  • Work location of your choice: office, remote, opportunity to work and travel.
  • Personal and professional growth at an exponential rate supported by well-defined feedback and promotion processes.
  • Please attach CV's in English.
  • To find out about how we handle your personal data, make sure to check out our Candidate Privacy Notice https://www.eneba.com/candidate-privacy-notice

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

  • Salary ranges may vary. We’re seeking candidates with varied experience levels; from individual contributors to functional leaders in this space.
  • We’re an international team and our business language of choice is English. Good English level is required, proficiency is preferred.

Key Skills

Ranked by relevance