Renaissance InfoSystems
Site Reliability Engineer (SRE)
Renaissance InfoSystemsAustralia1 day ago
Full-timeInformation Technology

About Us: Renaissance Info Systems is a technology and digital recruitment agency, connecting contract and permanent professionals with clients across Asia-Pacific. We aim to differentiate ourselves through our level of responsiveness, and our understanding that comes from being an IT recruitment agency from the IT Industry. Our recruiters balance sophisticated and simple inter-personal techniques to assure a strong candidate network.

Know More: http://www.reninfo.com.au


Role-Site Reliability Engineer (SRE) to join our growing team. As an SRE, you will play a key role in ensuring the reliability, scalability, and performance of our systems and services. You will be responsible for building and maintaining automation pipelines, implementing observability and monitoring solutions (using tools like Dynatrace), enhancing securitypractices, and ensuring the overall availability and resilience of our infrastructure.

You’ll work closely with development and operations teams to improve and maintain our infrastructure, troubleshoot issues, and automate processes to streamline service delivery.

Key Responsibilities:

  • DevOps Automation & CI/CD:
  • Develop and maintain CI/CD pipelines to automate deployments, tests, and rollbacks.
  • Integrate automation for provisioning infrastructure and managing configurations.
  • Implement Infrastructure-as-Code (IaC) practices using tools like Terraform, Ansible, or CloudFormation.
  • Observability & Monitoring:
  • Implement and maintain observabilitysolutions using tools like Dynatrace, Prometheus, Grafana, or other monitoring platforms.
  • Ensure real-time monitoring, logging, and alerting for critical systems.
  • Build dashboards and create actionable insights for both technical and business stakeholders.
  • Continuously optimize system performance based on data from observability tools.
  • High Availability & Resilience:
  • Ensure that applications and infrastructure are highly available, scalable, and fault tolerant.
  • Design and implement disaster recoveryand failover strategies.
  • Perform capacity planning, load testing, and fault injection to ensure the resilience of services.
  • Security & Compliance:
  • Implement security best practices to safeguard infrastructure, data, and applications.
  • Perform vulnerability assessments, security patching, and manage identity & access control.
  • Work closely with security teams to ensure compliance with internal policies and external regulations.
  • Monitor for and respond to security incidents and vulnerabilities across the stack.
  • Incident Management & Troubleshooting:
  • Participate in on-call rotations and respond to critical incidents with a focus on quick restoration of services.
  • Collaborate with cross-functional teams to root cause complex issues and implement long-term solutions.
  • Drive post-mortem analysis and ensure continuous improvement in incident response processes.
  • Collaboration & Documentation:
  • Work closely with development, QA, and product teams to ensure systems are designed for reliability, observability, and scalability.
  • Document processes, architecture, runbooks, and troubleshooting steps for internal and external teams.
  • Provide guidance and training to teams on DevOps best practices and reliability engineering principles.

Required Qualifications:

  • Experience:
  • 3+ years of experience in Site Reliability Engineering, DevOps, or related roles, with a focus on high-availability systems.
  • Hands-on experience with Dynatrace or similar APM/monitoring tools.
  • Strong experience with CI/CD pipelinetools like Jenkins, GitLab CI, CircleCI, or similar.
  • Proficiency in Infrastructure-as-Code (IaC) using tools such as Terraform, CloudFormation, or Ansible.
  • Expertise in containerizationtechnologies such as Docker and Kubernetes.
  • Technical Skills:
  • Solid understanding of cloud platforms(AWS, Azure, GCP) and their services (EC2, Lambda, Kubernetes, etc.).
  • Strong knowledge of Linux/Unix systems administration and scripting (Python, Bash, Go, etc.).
  • Experience with distributed systems, microservices architecture, and container orchestration.
  • Familiarity with security tools and practices such as encryption, access control, and vulnerability management.
  • Soft Skills:
  • Excellent problem-solving skills and ability to troubleshoot complex issues in production environments.
  • Strong collaboration and communication skills with both technical and non-technical teams.
  • Ability to prioritize and manage tasks in a fast-paced environment.



Regards,

Reshu Seth

Recruitment Consultant

Renaissance InfoSystem

M: +61 478 487 026

E: [email protected]

W: http://www.reninfo.com.au

Key Skills

Ranked by relevance