Intuition IT – Intuitive Technology Recruitment
Site Reliability Engineer
Intuition IT – Intuitive Technology RecruitmentIreland5 hours ago
Full-timeInformation Technology

Role: Site reliability Engineer

Location: Dublin, Ireland

Employment Type: Permanent



The role of business operations is to be the production readiness steward for the platform. This is accomplished by closely partnering with developers to design, build, implement, and support technology services. A business operations engineer will ensure operational criteria like system availability, capacity, performance, monitoring, self-healing, and deployment automation are implemented throughout the delivery process. Business Operations plays a key role in leading the DevOps transformation at Client through our tooling and by being an advocate for change and standards throughout the development, quality, release, and product organizations.


We accomplish this transformation through supporting daily operations with a hyper focus on triage and then root cause by understanding the business impact of our products. The goal of every biz ops team is to shift left to be more proactive and upfront in the development process, and to proactively manage production and change activities to maximize customer experience, and increase the overall value of supported applications. Biz Ops teams also focus on risk management by tying all our activities together with an overarching responsibility for compliance and risk mitigation across all our environments. A biz ops focus is also on streamlining and standardizing traditional application specific support activities and centralizing points of interaction for both internal and external partners by communicating effectively with all key stakeholders.


Operational Readiness Architect:

  • Serve as the primary contact responsible for the overall application health, performance, and capacity
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
  • Partner with the development and product team of a new application to establish the right monitoring and alerting strategy and create the framework to achieve zero downtime during deployment.

Site Reliability Engineering:

  • Serve as the primary contact responsible for ensuring application scalability, performance, and resilience.
  • Practice sustainable incident response and blameless post-mortems while taking a holistic approach to problem solving and optimizing time to recover.
  • Automate data-driven alerts to proactively escalate issues. Work with development teams to establish SLOs and improve reliability.

DevOps/Automation:

  • Tackle complex development, automation, and business process problems. Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
  • Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Client in DevOps automation and best practices.
  • Increase automation and tooling to reduce toil and manual intervention

Role Qualifications:

The ideal candidate will have experience in many of these areas:

  • BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
  • Coding and/ or scripting exposure.
  • Appetite for change and pushing the boundaries of what can be done with automation. Be curious about new technology, infrastructure, and practices to scale our architecture and prepare for future growth.
  • Experience with algorithms, data structures, scripting, pipeline management, and software design
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Interest in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Willingness and ability to learn and take on challenging opportunities and to work as a member of matrix based diverse and geographically distributed project team.
  • Ability to balance doing things right with fixing things quickly. Flexible and pragmatic, while working towards improving the long-term health of the system.
  • Comfortable collaborating with cross-functional teams to ensure that expected system behavior is understood, and monitoring exists to detect anomalies.

Kafka Knowledge is MUST

  • 3-5 years of experience working with Apache Kafka in a production environment.
  • Strong knowledge of Kafka architecture, including brokers, topics, partitions, and replicas.
  • Experience with Kafka security, including SSL, SASL, and ACLs.
  • Proficiency in configuring, deploying, and managing Kafka clusters in cloud and on-premises environments.
  • Experience with Kafka stream processing using tools like Kafka Streams, KSQL, or Apache Flink.
  • Solid understanding of distributed systems, data streaming, and messaging patterns.
  • Proficiency in Java, Scala, or Python for Kafka-related development tasks.
  • Familiarity with DevOps practices, including CI/CD pipelines, monitoring, and logging.
  • Experience with tools like Zookeeper, Schema Registry, and Kafka Connect.
  • Strong problem-solving skills and the ability to troubleshoot complex issues in a distributed environment.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams and stakeholders.

Key Skills

Ranked by relevance