-
HN Services Romania

Site Reliability Engineer

HN Services Romania
Romania · Full-time · Mid-Senior

With 41 years of experience in the international market and 18 years in Romania, HN Services means:

Digital transformation and IT professionals;

Diversity of technical roles;

Wide IT market exposure in different domains of activity;

A dedicated internal training center for software development


💡 We, at HN Services Romania, are expanding our team in Bucharest, and we are looking for new, experienced professionals ready to deliver high-quality solutions. The goal is to ensure that the Software Development services provided to the business are stable, resilient and performant.


You do some of this daily:


  • Approach operations challenges with a software engineering perspective, leveraging: Coding, Automation and Engineering principles.
  • Collaborate with development teams and other stakeholders to identify potential risks.
  • Once risks are identified, you will analyze and evaluate potential impact and likelihood of occurrence.
  • Based on the risk assessment, you will implement various risk mitigation strategies to mitigate operational risks.
  • Continuously monitor and review the effectiveness of their risk strategies.
  • Study historical trends in terms of performance by using metrics like charts and graphs.
  • Monitor the log files to manage infrastructures at scale.
  • Minimizing the MTTR for reliable systems is necessary to reduce downtime. As an SRE, you can improve this metric by resolving the incidents quickly.

Responsibilities:

  • Monitoring system performance, identifying bottlenecks, and executing pipeline optimization.
  • Implementing comprehensive service metrics to track and report on system reliability, performance, and efficiency.
  • Developing and maintaining CI/CD pipelines, enhancing the consistency and speed of software deployment.
  • Automating routine tasks and creating tools to improve team efficiency and system robustness.
  • Collaborating with development teams to integrate operational considerations into the software development life cycle.
  • Managing incident response protocols, including on-call rotations for junior engineers and strategic planning for senior personnel.
  • Conducting post-incident reviews to prevent recurrence and refine the system reliability framework.
  • Contributing to disaster recovery plans and ensuring robust backup systems are in place.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Create sustainable systems and services through automation and uplifts.
  • Balance feature development speed and reliability with well-defined service-level objectives.
  • Working on-call shift to prevent incidents from ever happening.
  • Running our infrastructure with Ansible, Terraform, GitLab CI/CD, and Kubernetes.

Must have skills:

  • Experience in using: Linux, UNIX and Windows
  • DB administration & maintenance: Oracle, Cassandra, PostgreSQL, AWS DB setups, Caching DB.
  • Familiar with: GIT, Jira, Jenkins, Ansible
  • Strong knowledge of DevOps and CI/CD pipeline (GitHub, Terraform)
  • Knowledge of monitoring solutions: Grafana, Prometheus, Dynatrace
  • 'Hands-on' AWS implementation experience across a broad range of AWS services.
  • Must have AWS development experience (Containerization - Docker, Amazon EKS, Lambda, EC2, S3, Amazon DocumentDB, PostgreSQL)
  • Experience with core AWS platform architecture, including areas such as: Organizations, Account Design, VPC, Subnet, segmentation strategies.
  • Comfortable working with cloud-native infrastructure, such as AWS Lambda, Google App Engine, and Azure Cloud Services.
  • Backup and Disaster Recovery approach and design
  • Environment and application automation
  • Proficiency in programming languages such as Python, Go, or Java
  • Familiar with Encryption, Logging, and Privacy/Security Protocols (e.g., TLS 1.2, ELK stack)
  • Good knowledge of REST/SOAP/JSON web service API implementation.

Formal requirements:


  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.
  • Strong understanding of cloud-based applications and infrastructure, including AWS, Azure, or Google Cloud.
  • Experience with IT operations best practices such as ITIL, COBIT, or DevOps.
  • Experience with IT service management tools such as ServiceNow or Remedy.
  • Familiarity with banking customer acquisition applications is preferred.

Key Skills

Ranked by relevance

aws cloud cicd incident response containerization postgresql prometheus terraform cassandra jenkins ansible grafana python docker devops oracle gitlab linux unix itil jira git elk eks s3
Login to Apply
Posted
Jan 14, 2025
Type
Full-time
Level
Mid-Senior
Location
Bucharest

Industries

IT Services IT Consulting

Categories

Business Development

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
RiDiK (a Subsidiary of CLPS. Nasdaq: CLPS)
Related

Backend Engineer

2026-05-26

Contract
Mid-Senior
Singapore
IT Services
Design
View Job Details
EPAM Systems
Related

DevOps Engineer (AWS)

2026-05-27

Full-time
Associate
Argentina
Software Development
Engineering
View Job Details
EPAM Systems
Related

Lead AI Engineer

2026-05-26

Full-time
Mid-Senior
Turkey
Software Development
Information Technology