Human Direct
Site Reliability Engineer (SRE)
Human DirectRomania22 hours ago
Full-timeRemote FriendlyEngineering, Information Technology

Role Summary

This is a hybrid role that balances proactive engineering projects—such as enhancing automation and scaling Kubernetes—with a strong focus on operational excellence. You’ll contribute to both the day-to-day stability and the long-term reliability of production systems.

It’s an exciting opportunity to make a real impact: our client is in the process of formally adopting SRE principles, and you’ll be a key player in defining and implementing these practices. The role is well-suited for a proactive problem-solver who is passionate about building resilient systems and eager to stay ahead in the fast-evolving cloud landscape.



Responsibilities

  • Reliability & Availability: Design, implement, and test High Availability, Backup, and Disaster Recovery strategies.
  • Monitoring & Observability: Build a comprehensive monitoring and alerting strategy using Azure Monitor, Application Insights, and related tools.
  • SRE Practice: Help establish and implement SRE best practices, define SLOs/SLIs, and drive data-informed decisions.
  • Kubernetes Management: Deploy, manage, and scale applications on Azure Kubernetes Service (AKS).
  • Infrastructure & Automation: Build and maintain Azure infrastructure using IaC (Bicep, Azure DevOps) and enhance CI/CD pipelines.
  • Cloud Governance: Implement best practices for security, cost optimization, and compliance.
  • Operational Support: Participate in an on-call rotation, driving a blameless post-mortem culture.
  • Collaboration: Work closely with developers to ensure services are reliable, scalable, and secure from the start.



Qualifications

  • 3+ years of experience as a Cloud Engineer, DevOps, or SRE.
  • Hands-on experience with Microsoft Azure (App Service, VMs, AKS, networking).
  • Infrastructure as Code expertise, especially Bicep.
  • Experience with monitoring and alerting (Azure Monitor, Application Insights, Log Analytics, Zabbix).
  • Strong troubleshooting, root cause analysis, and telemetry analysis skills.
  • Experience with CI/CD concepts and tools, especially Azure DevOps.
  • Proactive, problem-solving mindset with a passion for automation.


Nice-to-Have Skills

  • Hands-on experience with AKS containerization and orchestration.
  • Curiosity to learn about VoIP technologies (SIP, Asterisk).
  • Familiarity with Azure AI services (OpenAI, Cognitive Services, AI Foundry).
  • Prior exposure to SRE frameworks (SLOs/SLIs, error budgets).
  • Experience with databases like Azure SQL, CosmosDB, MySQL, PostgreSQL.

Key Skills

Ranked by relevance