Site Reliability Engineer

HN Services Romania

Romania · Full-time · Mid-Senior

With 41 years of experience in the international market and 18 years in Romania, HN Services means:

Digital transformation and IT professionals;

Diversity of technical roles;

Wide IT market exposure in different domains of activity;

A dedicated internal training center for software development

💡 We, at HN Services Romania, are expanding our team in Bucharest, and we are looking for new, experienced professionals ready to deliver high-quality solutions. The goal is to ensure that the Software Development services provided to the business are stable, resilient and performant.

You do some of this daily:

Approach operations challenges with a software engineering perspective, leveraging: Coding, Automation and Engineering principles.
Collaborate with development teams and other stakeholders to identify potential risks.
Once risks are identified, you will analyze and evaluate potential impact and likelihood of occurrence.
Based on the risk assessment, you will implement various risk mitigation strategies to mitigate operational risks.
Continuously monitor and review the effectiveness of their risk strategies.
Study historical trends in terms of performance by using metrics like charts and graphs.
Monitor the log files to manage infrastructures at scale.
Minimizing the MTTR for reliable systems is necessary to reduce downtime. As an SRE, you can improve this metric by resolving the incidents quickly.

Responsibilities:

Monitoring system performance, identifying bottlenecks, and executing pipeline optimization.
Implementing comprehensive service metrics to track and report on system reliability, performance, and efficiency.
Developing and maintaining CI/CD pipelines, enhancing the consistency and speed of software deployment.
Automating routine tasks and creating tools to improve team efficiency and system robustness.
Collaborating with development teams to integrate operational considerations into the software development life cycle.
Managing incident response protocols, including on-call rotations for junior engineers and strategic planning for senior personnel.
Conducting post-incident reviews to prevent recurrence and refine the system reliability framework.
Contributing to disaster recovery plans and ensuring robust backup systems are in place.
Partner with development teams to improve services through rigorous testing and release procedures.
Participate in system design consulting, platform management, and capacity planning.
Create sustainable systems and services through automation and uplifts.
Balance feature development speed and reliability with well-defined service-level objectives.
Working on-call shift to prevent incidents from ever happening.
Running our infrastructure with Ansible, Terraform, GitLab CI/CD, and Kubernetes.

Must have skills:

Experience in using: Linux, UNIX and Windows
DB administration & maintenance: Oracle, Cassandra, PostgreSQL, AWS DB setups, Caching DB.
Familiar with: GIT, Jira, Jenkins, Ansible
Strong knowledge of DevOps and CI/CD pipeline (GitHub, Terraform)
Knowledge of monitoring solutions: Grafana, Prometheus, Dynatrace
'Hands-on' AWS implementation experience across a broad range of AWS services.
Must have AWS development experience (Containerization - Docker, Amazon EKS, Lambda, EC2, S3, Amazon DocumentDB, PostgreSQL)
Experience with core AWS platform architecture, including areas such as: Organizations, Account Design, VPC, Subnet, segmentation strategies.
Comfortable working with cloud-native infrastructure, such as AWS Lambda, Google App Engine, and Azure Cloud Services.
Backup and Disaster Recovery approach and design
Environment and application automation
Proficiency in programming languages such as Python, Go, or Java
Familiar with Encryption, Logging, and Privacy/Security Protocols (e.g., TLS 1.2, ELK stack)
Good knowledge of REST/SOAP/JSON web service API implementation.

Formal requirements:

Bachelor's degree in Computer Science, Information Technology, or a related field.
Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.
Strong understanding of cloud-based applications and infrastructure, including AWS, Azure, or Google Cloud.
Experience with IT operations best practices such as ITIL, COBIT, or DevOps.
Experience with IT service management tools such as ServiceNow or Remedy.
Familiarity with banking customer acquisition applications is preferred.

Key Skills

Ranked by relevance

aws cloud cicd incident response containerization postgresql prometheus terraform cassandra jenkins ansible grafana python docker devops oracle gitlab linux unix itil jira git elk eks s3

Related Jobs

3 roles aligned with this opportunity

View all jobs

Java Fullstack Developer

2026-05-13

Full-time

Mid-Senior

Romania

IT Services

Information Technology

Engineering Manager, SAP CAP Java

2026-07-11

Full-time

Associate

Latvia

Software Development

Information Technology

Senior DevOps Engineer

2026-07-09

Full-time

Mid-Senior

Argentina

Software Development

Engineering

Posted: Jan 14, 2025
Type: Full-time
Level: Mid-Senior
Location: Bucharest
Company: HN Services Romania

Industries

IT Services IT Consulting

Related Jobs

3 roles aligned with this opportunity

View all jobs

Java Fullstack Developer

2026-05-13

Full-time

Mid-Senior

Romania

IT Services

Information Technology

Engineering Manager, SAP CAP Java

2026-07-11

Full-time

Associate

Latvia

Software Development

Information Technology

Senior DevOps Engineer

2026-07-09

Full-time

Mid-Senior

Argentina

Software Development

Engineering

Site Reliability Engineer

Key Skills

Related Jobs

Java Fullstack Developer

Engineering Manager, SAP CAP Java

Senior DevOps Engineer

Related Jobs

Java Fullstack Developer

Engineering Manager, SAP CAP Java

Senior DevOps Engineer

Cookie Settings