Site Reliability Engineer (Kafka)

Falcon Smart IT (FalconSmartIT)Ireland4 hours ago

Full-timeRemote FriendlyInformation Technology

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Role: Site Reliability Engineer (Kafka)

Location: Dublin, Ireland (Hybrid)

Job type: Permanent

Requirements

Responsibilities:

Design, develop, and maintain automation scripts, tools, and integrations using languages such as Python, Go, Java, or Bash.

Write clean, maintainable code, perform debugging, interact with APIs, and manage version control (Git) with unit testing.

Administer Linux/Unix systems, including process management, file systems, permissions, kernel tuning, shell scripting, server configuration, updates, and security hardening.

Build and manage cloud infrastructure on AWS, GCP, or Azure, leveraging IaC tools like Terraform or CloudFormation.

Architect and operate scalable and highly available systems using Kubernetes, ECS, or other container orchestration tools.

Configure and troubleshoot network protocols and services (TCP/IP, HTTP, DNS, VPNs, firewalls, load balancing) with diagnostic tools (e.g., Wireshark, traceroute).

Implement observability practices using Splunk, Dynatrace, Prometheus, Grafana, Datadog, Jaeger/Zipkin. Define SLIs/SLOs and build dashboards for actionable insights into system health.

Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, or GitHub Actions to automate build, test, and deployment processes, including rollback strategies.

Diagnose and resolve production issues through logs, metrics, and debugging tools. Participate in incident management, perform root cause analysis (RCA), and contribute to blameless postmortems.

Implement security best practices: secrets management (Vault), zero-trust architectures, vulnerability management, and compliance standards (SOC 2, GDPR).

Manage and operate Apache Kafka (must-have skill): configure topics, manage partitions, ensure high availability, monitor metrics (e.g., consumer lag, throughput), and troubleshoot issues like message loss or latency.

Work with Axon Framework (must-have skill): design and maintain event-driven systems using CQRS/ES (Command Query Responsibility Segregation / Event Sourcing) patterns, integrate with Kafka for event streaming, and ensure scalability and resilience of distributed applications.

Manage and operate other messaging/streaming platforms such as NATS or MQ as needed.

Qualifications:

BS in Computer Science or a related technical field (e.g., Physics, Mathematics) OR equivalent practical experience.

4–5 years of hands-on experience in software development, systems administration, and cloud infrastructure management.

Proven expertise in Apache Kafka or Axon Framework (must-have).

Key Skills

Ranked by relevance

Ready to apply?

Join Falcon Smart IT (FalconSmartIT) and take your career to the next level!

Application takes less than 5 minutes

Apply