Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
At Eneba, we’re building an open, safe and sustainable marketplace for the gamers of today and tomorrow. Our marketplace supports close to 20m+ active users (and growing fast!), provides a level of trust, safety and market accessibility unparalleled to none. We’re proud of what we’ve accomplished in such a short time and look forward to sharing this journey with you. Join us as we continue to scale, diversify our portfolio, and grow with the evolving community of gamers.
About Your Team
The Platform team builds, deploys, monitors, and is on call for the platform components and underlying platform infrastructure. The platform team creates tools for other teams to perform in the most stable, fast, and precise manner. Platform team members do not shy away from architecture-level assignments. They follow the latest tech trends, pulse, and know about the most effective tools of the moment. Eneba’s users cannot visually see the impact of the platform team, however, it is felt via the presence of speed, quality, and new features.
We’re expanding the team with a dedicated Site Reliability Engineer who will take ownership of observability, reliability practices, and system visibility across a highly distributed environment.
As a Site Reliability Engineer, you will own and evolve the entire observability and reliability layer of our platform. You’ll improve our metrics, logs, and tracing ecosystem; guide teams in building reliable services; introduce SLOs and error budgets; implement production readiness processes; and support developers during incidents by helping identify failing components across distributed systems.
You will be the driving force behind making reliability and observability a first-class part of our platform and self-service workflows.
Responsibilities
- Own and evolve our observability stack across metrics, logs, and tracing using Prometheus CRDs, Thanos, Alertmanager, Loki, Sentry, Grafana, and supporting AWS services.
- Improve system reliability by designing, implementing, and maintaining SLIs, SLOs, and error budgets, ensuring our services meet reliability objectives.
- Enhance system visibility, enabling teams to proactively detect issues, reduce MTTR, and improve incident response workflows.
- Build internal self-service capabilities for metrics, alerts, dashboards, and instrumentation to empower development teams.
- Tune and optimize the Thanos stack, improving query performance, cache effectiveness, retention policies, and cost efficiency.
- Extend and maintain monitoring Helm charts, Prometheus rules, exporters, and dashboards-as-code.Collaborate with Backend, DevOps, and Platform teams to ensure reliability and observability are built into services from the design phase.
- Support incident investigations, help pinpoint root causes, correlate metrics/logs/traces, and contribute to blameless postmortems.
- Maintain observability cost efficiency, reducing waste through retention strategy, metric cardinality tuning, and performance improvements.
- Keep the monitoring stack healthy and up to date, ensuring reliability, security, and alignment with best practices.
- Hands-on experience with production observability systems, especially Prometheus, Alertmanager, Grafana, and log/trace platforms like Elasticsearch, Loki, Sentry, or their equivalents.
- Experience with Thanos or large-scale metrics systems, including tuning, caching strategies, and long-term storage.
- Strong understanding of SLIs, SLOs, error budgets, MTTR, reliability patterns, and incident response workflows.
- Solid experience with Kubernetes in production and deep understanding of how to monitor it (exporters, node metrics, service mesh signals).Proficiency with Infrastructure as Code (Terraform preferred) and automation best practices.
- Experience with AWS monitoring, scaling, and distributed cloud resource observability.
- Proficiency in scripting or programming (Go, Python, or Bash) to build automation and tooling.
- Ability to reason about distributed systems failures, correlate signals, and guide teams through root-cause analysis.
- Strong ownership mindset, excellent communication, and eagerness to collaborate with development teams.
- Experience designing, tuning, or operating Thanos at scale.
- Experience building self-service observability tooling or dashboards-as-code frameworks.
- Deep understanding of alert fatigue reduction, signal-to-noise optimization, and high-quality alerting patterns.
- Experience implementing resilience testing, fault injection, or chaos engineering.
- Familiarity with service meshes (Istio, Linkerd) or service-level reliability patterns (circuit breakers, retries, rate limiting).
- Background operating multi-region or global-scale systems with complex telemetry needs.
- Opportunity to join our Employee Stock Options program.
- Opportunity to help scale a unique product.
- Various bonus systems: performance-based, referral, additional paid leave, personal learning budget.
- Paid volunteering opportunities.
- Work location of your choice: office, remote, opportunity to work and travel.
- Personal and professional growth at an exponential rate supported by well-defined feedback and promotion processes.
- Please attach CV's in English.
- To find out about how we handle your personal data, make sure to check out our Candidate Privacy Notice https://www.eneba.com/candidate-privacy-notice
- Salary ranges may vary. We’re seeking candidates with varied experience levels; from individual contributors to functional leaders in this space.
- We’re an international team and our business language of choice is English. Good English level is required, proficiency is preferred.
Key Skills
Ranked by relevanceReady to apply?
Join Eneba and take your career to the next level!
Application takes less than 5 minutes

