Fulcrum Digital Inc
DevOps Engineer
Fulcrum Digital IncIreland16 days ago
Full-timeRemote FriendlyInformation Technology

SRE Devops

Dublin/Hybrid

Permanent


The Role

  • Plan, manage, and oversee all aspects of a Production Environment
  • Define strategies for Application Performance Monitoring, Optimization in Prod environment
  • Respond to Incidents and improvise platform based on feedback and measure the reduction of incidents over time.
  • Support deployment of code into multiple lower environments. Supporting current processes with an emphasis on automating everything as soon as possible.
  • Design, develop and standardize Monitoring and Alerting mechanism for the supported applications.
  • Take a holistic approach to problem solving, by connecting the dots during a production event through the various technology stack that makes up the platform, to optimize meantime to recover.
  • Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
  • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns.
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
  • Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead in DevOps automation and best practices.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Scale systems sustainably through mechanisms like automation and evolving systems by pushing for changes that improve reliability and velocity.
  • Work with a global team spread across tech hubs in multiple geographies and time zones.
  • Ability to share knowledge and explain processes and procedures to others.
  • Share knowledge and mentor junior resources
  • Able to perform on-call duties on a rotational basis.
  • Occasional off hours work required.


Requirements

Key skills Must to have

Jenkins

Chef

Bash

Splunk

Dynatrace

Linux

Bit Bucket

Problem Management

ITIL

Remedy


Good To have

Python

AWS * Migrating to AWS


Key Responsibilities

What You’ll Do:

•Demonstrate and innovate SRE practices by collaborating with stakeholders to implement important SRE principles and objectives and create new practices where applicable.

•Partner with product and platform teams to define and track service level objectives (SLOs) and indicators (SLIs).

•Monitor and manage system reliability performance, ensuring systems meet SLOs.

•Communicate reliability concerns and their potential impact with key stakeholders.

•Promote the prioritization of reliability throughout the software development life cycle.

•Design, code, test, and deliver solutions to automate manual operations.

•Participate in on-call rotations, provide support for SRE systems, and lead or participate in post-mortem incident analysis.

•Engage in system design, capacity planning, and architecture discussions to ensure operational requirements are met.

•Share lessons learned and best practices regarding reliability and performance with stakeholders and team members.

•Assist in training and mentoring fellow junior SREs to ensure best practices are followed and scaled within the organization.

•Pursue continuous improvement opportunities to stay up to date on SRE methods and trends and participate in organizational learning initiatives.

•Support governance and ensure compliance with policies by collaborating with security, compliance, and other teams.

•Respond promptly to requests for assistance from technical customers, providing engineering support and best-practice guidance.

•Adhere to and suggest improvements to standard operating procedures, advocate for automation and workflow optimization.

Team Specific Skills

It is not expected that any single candidate would have expertise across all these areas, but a Biz Ops engineer will spend time throughout their career with various aspects of the role:

Operational Resiliency Architect:

•Support application health, performance, and capacity.

•Assist in system design consulting, capacity planning, and launch reviews.

•Collaborate with development and product teams to establish monitoring and alerting strategies. DevOps/Automation:

•Engage in development, automation, and business process improvement.

•Support CI/CD pipelines and promote software into higher environments.

•Increase automation and tooling to reduce manual intervention

ITSM Practices:

•Analyze ITSM activities and provide feedback to development teams on operational gaps or resiliency concerns.

•Perform root cause analysis of incidents and work with development teams to resolve issues.

Key Skills

Ranked by relevance