Track This Job
Add this job to your tracking list to:
- Monitor application status and updates
- Change status (Applied, Interview, Offer, etc.)
- Add personal notes and comments
- Set reminders for follow-ups
- Track your entire application journey
Save This Job
Add this job to your saved collection to:
- Access easily from your saved jobs dashboard
- Review job details later without searching again
- Compare with other saved opportunities
- Keep a collection of interesting positions
- Receive notifications about saved jobs before they expire
AI-Powered Job Summary
Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.
Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.
Founded in 2012, Playson is a leading iGaming supplier recognized worldwide. We provide our customers with a high-end micro-service-based platform as a service that aims to process billions of financial transactions per day. We provide a cross-regional setup and are chasing latency reduction down to zero. We highly invest in delivering the best game experience and smooth connection regardless of the internet coverage and bandwidth of the game clients.
We are currently seeking an experienced Senior Site Reliability Engineer to join our dynamic Platform Tribe.
Key Responsibilities
- Manage day-to-day alerts, system checks, and issue escalation as necessary.
- Provide 24x7 on-call support for critical SaaS events.
- Document issues and remediation steps.
- Proactively create monitors within the EKS/K8s ecosystem.
- Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
- Enhance infrastructure health by implementing checks and scripts to address known issues.
- Maintain and develop deployment code.
- Implement/integrate new technologies into our Cloud Infrastructure.
- Collaborate with other teams to provide top-notch support and assistance.
- Prioritize customer focus in planning deployments/updates, ensuring minimal impact.
- Conduct RCA and take necessary corrective actions to prevent issue recurrence.
- Assign alert-related actions to the appropriate team after investigation.
- Handle support requests for environment-specific actions.
Requirements
- Strong experience with issue processing (RCA, Postmortems).
- Proficiency in Kubernetes (deployment, scaling, troubleshooting).
- Familiarity with AWS, Terraform, Docker, CI/CD.
- Experience with monitoring tools like DataDog, Prometheus, Grafana, and logging solutions like Elasticsearch, Logstash, and Kibana (ELK Stack) or AWS CloudWatch.
- Strong understanding of networking concepts and protocols.
- Proficiency in at least one scripting language (e.g., Python, NodeJS, Go).
- Experience with configuration management tools like FluxCD/ArgoCD.
- Proficiency in Git or other version control systems.
- Familiarity with incident response and management tools like PagerDuty, Opsgenie, or VictorOps.
- Ownership, proactiveness, persistence, and passion for maintaining a high-traffic online platform.
What We Offer
- Competitive Salary: With annual performance & Salary reviews.
- Quarterly Bonuses: Benefit from a transparent and systematic quarterly bonus system.
- Flexible Schedule: We offer a flexible work schedule to accommodate your needs.
- Remote Work: Providing greater flexibility and comfort.
- Medical Insurance: Receive comprehensive medical insurance for you and +1.
- Financial Support for Life Events: We provide financial support during special life events.
- Unlimited Paid Vacation: Enjoy unlimited paid vacation leave.
- Unlimited Paid Sick Leave: Take unlimited paid sick leave whenever necessary.
- Professional Development: Get reimbursement for professional development courses and trainings.
Key Skills
Ranked by relevanceReady to apply?
Join Playson and take your career to the next level!
Application takes less than 5 minutes

