Site Reliability Engineer (SRE)

Human DirectRomania22 hours ago

Full-timeRemote FriendlyEngineering, Information Technology

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Role Summary

This is a hybrid role that balances proactive engineering projects—such as enhancing automation and scaling Kubernetes—with a strong focus on operational excellence. You’ll contribute to both the day-to-day stability and the long-term reliability of production systems.

It’s an exciting opportunity to make a real impact: our client is in the process of formally adopting SRE principles, and you’ll be a key player in defining and implementing these practices. The role is well-suited for a proactive problem-solver who is passionate about building resilient systems and eager to stay ahead in the fast-evolving cloud landscape.

Responsibilities

Reliability & Availability: Design, implement, and test High Availability, Backup, and Disaster Recovery strategies.
Monitoring & Observability: Build a comprehensive monitoring and alerting strategy using Azure Monitor, Application Insights, and related tools.
SRE Practice: Help establish and implement SRE best practices, define SLOs/SLIs, and drive data-informed decisions.
Kubernetes Management: Deploy, manage, and scale applications on Azure Kubernetes Service (AKS).
Infrastructure & Automation: Build and maintain Azure infrastructure using IaC (Bicep, Azure DevOps) and enhance CI/CD pipelines.
Cloud Governance: Implement best practices for security, cost optimization, and compliance.
Operational Support: Participate in an on-call rotation, driving a blameless post-mortem culture.
Collaboration: Work closely with developers to ensure services are reliable, scalable, and secure from the start.

Qualifications

3+ years of experience as a Cloud Engineer, DevOps, or SRE.
Hands-on experience with Microsoft Azure (App Service, VMs, AKS, networking).
Infrastructure as Code expertise, especially Bicep.
Experience with monitoring and alerting (Azure Monitor, Application Insights, Log Analytics, Zabbix).
Strong troubleshooting, root cause analysis, and telemetry analysis skills.
Experience with CI/CD concepts and tools, especially Azure DevOps.
Proactive, problem-solving mindset with a passion for automation.

Nice-to-Have Skills

Hands-on experience with AKS containerization and orchestration.
Curiosity to learn about VoIP technologies (SIP, Asterisk).
Familiarity with Azure AI services (OpenAI, Cognitive Services, AI Foundry).
Prior exposure to SRE frameworks (SLOs/SLIs, error budgets).
Experience with databases like Azure SQL, CosmosDB, MySQL, PostgreSQL.

Key Skills

Ranked by relevance

Ready to apply?

Join Human Direct and take your career to the next level!

Application takes less than 5 minutes

Apply