Site Reliability Engineer

The Expertise We’re Looking For

Bachelor’s degree or higher in a technology related field (e.g. Engineering, Computer Science.) required, master’s degree a plus.
8+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale.
Hands-on experience with Public Cloud, preferably AWS or Azure.
Hands-on experience with EKS, AKS OR Rancher Kubernetes Service container orchestration.
Experience operating and implementing distributed & highly concurrent service-based architectures, including microservices, containerized services, and/or serverless architectures.
Thought leadership and an ability to handle production incidents.

The Skills You Bring

Hands-on Kubernetes skills and knowledge.
Programming/development track record with a compiled/OOP-geared language like C# or Java and scripting/interpreted language experience like JavaScript/TypeScript or Python.
Proven experience in maintaining scalability and resiliency of complex environment.
Demonstrated ability to utilize modern monitoring tools (Datadog, Prometheus, Splunk, …)
Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.
Understand, Implement and be accountable for the Production Services/SRE Capabilities across Digital Security. This includes a direct knowledge of the capabilities, usage & value, gaps and challenges.
Technical & Operational leadership and be an escalation point of contact during major incidents or issues that are not resolved in the expected timeframes. Hands-on responsibility to actively lead Production bridges during major incidents working across the team and the Enterprise Infrastructure organization.
Responsible for the execution and quality controls for the Fidelity Brokerage Business Unit Specific Post-Mortem reviews for the team including deep technical RCA, Observability & Automation reviews and act as the connection across Enterprise Infrastructure domains in the region.

The Value You Deliver

Help define and execute a comprehensive reliability and observability strategy, ensuring that Fidelity’s systems are always available when our customers need them.
Bring together technical, procedural, and financial data to reduce toil and increase efficiency.
You will execute plans for technical standardization and process refinement within the engineering organization, especially for Site Reliability Engineers.
Troubleshoot stack-wide engineering issues related to hardware, software, network, applications and cloud service providers.
Coach peer SREs and development teams on how to build highly available systems.

Related Jobs