Site Reliability Engineer

The Tribe’s application portfolio consists of modern cloud-driven Online Banking & Brokerage customer services, administration frontends used by internal employees as well as further customer-oriented innovative solutions. We work in an agile environment, with a focus on customer centricity and outstanding user experience, and with high reusability and flexibility of technical solutions in mind. Our fundamental is that the highest application availability, scalability, technology- and security standards are a must. With our platform we want to be an enabler for highest quality cloud-based software solutions and processes.

Responsibilities

- Define Service Level Objectives (SLOs), and enable an end-to-end view on customer satisfaction based on best practices for setting up Service Level Indicators (SLIs) to create effective strategies for maintaining and improving system performance and availability.

- Collaborate with Business Functional Analysts and Solution Architects to find improvements in the solution design to improve the resilience of technical solutions early on.

- Consult and guide the squad on the prioritization of reliability improvement and actively deliver them as part of the sprint.

- Hands-on experience in implementing reliability and resilience patterns like auto-scaling, circuit breakers, bulk-heads, rate limiter, retry

mechanisms, etc.

- Actively work on service request fulfilment, incident and problem mgmt. to identify and reduce toil and the MTTR with engineering best

practices.

- Align and contribute on state-of-the-art SRE best practices e.g. Distributed Tracing, Open Telemetry and Chaos Engineering with the SRE

chapter function.

- Be a knowledge- and skill multiplicator of your profession by being a Lead of the Site Reliability engineer population

- Increase the seniority of the overall Site Reliability Engineer chapter by establishing events and procedures, and foster a culture of high

standards

- Lead people of your engineer profession and make them become better each day

Skills:

Expert knowledge and hands-on experience with applications hosted on cloud platforms such as Google Cloud Platform as well as with

Docker / Kubernetes in combination with with Google Kubernetes Engine (GKE), Terraform or similar technology.

- Experience in resilient software development in Python/JAVA and the usage of modern CI/CD pipelines e.g. Github, Github Actions, Bitbucket, Helm.

- Strong experience in the setup of observability, monitoring and self-healing solutions for instance with New Relic, Splunk, Google Cloud

Operations, Lightstep and Ansible.

- Very good knowledge of security standards (e.g.: TLS, OAuth2, KMS, Vault, Admission Controllers, let's encrypt), microservice architectures

and experience with API Management with Apigee or WSO2.

- Proactive attitude and collaborative Team player mindset paired with self confidence.

- Not loosing your coolness and keep your eye for details even in stressful situations where time matters.

- Having a creative approach towards solving technical problems.

- Excellent communication skills in English.

Location: Bucharest

Working model: hybrid

Site Reliability Engineer

Key Skills

Related Jobs