Site Reliability Engineer

Tech Mahindra

Australia · Full-time · Associate

About the job

Tech Mahindra represents the connected world, offering innovative and customer-centric information technology experiences, enabling Enterprises, Associates and Society to Rise™. We are a USD 5.1 billion company with 126,200+ professionals across 90 countries, helping 1058 global customers including Fortune 500 companies. We are focused on leveraging next-generation technologies including 5G, Blockchain, Cybersecurity, Artificial Intelligence, and more, to enable end-to-end digital transformation for global customers. Tech Mahindra is one of the fastest-growing brands and amongst the top 15 IT service providers globally.

Why Tech Mahindra?

You will work within a successful, established and trusted organisation, where you the opportunities are only limited by your aspirations; Do you want to travel the world? Do you want to be always learning new skills and developing professionally and personally? We offer everything you could desire with Flexible work, excellent salary, healthcare, dedicated training/certification platform and unrivalled recognition throughout the business.

Roles:

We are seeking a Senior Observability Engineer with expertise in configuring and optimizing monitoring tools such as Dynatrace, Elasticsearch, and Nagios XI. In this role, you will play a crucial part in ensuring system reliability and aligning observability practices with Site Reliability Engineering (SRE) standards

Key Responsibilities:

• Observability and Monitoring Strategy: Design and implement end-to-end observability solutions that align with SRE standards. Provide comprehensive monitoring and alerting to track system health, performance, and reliability.

• Tool Configuration and Management: Configure, deploy, and manage Dynatrace, Elasticsearch, and Nagios XI to monitor critical applications, infrastructure, and network components, supporting real-time visibility into service performance.

• SRE Standard Implementation: Collaborate with engineering and operations teams to develop and implement observability practices that meet SRE standards, such as setting SLAs, SLOs, and error budgets.

• Performance Optimization: Work with development and infrastructure teams to identify performance bottlenecks and optimize applications, with a focus on meeting SRE metrics for system reliability and availability.

• Incident Management and Root Cause Analysis: Develop alerting and escalation processes based on SRE best practices. Lead or support incident response and perform root cause analysis to continuously improve reliability.

• Data Analysis and Dashboarding: Set up and maintain dashboards, log management, and metric visualizations in Dynatrace, Elasticsearch, and Nagios XI. Provide insights into performance trends and system health in alignment with SRE goals.

• Documentation and Mentorship: Create clear documentation of observability practices, configuration details, and troubleshooting guidelines. Mentor junior team members and promote an SRE-driven observability mindset.

Requirements:

• Experience: 5+ years in observability, monitoring, or related engineering roles, with a focus on SRE standards and at least 3 years working with Dynatrace, Elasticsearch, and Nagios XI.

• Technical Skills:

• Expert experience with Dynatrace for application performance monitoring and troubleshooting.

• Proficiency with Elasticsearch for log analysis, data indexing, and search optimization.

• Strong experience configuring and managing Nagios XI for infrastructure monitoring.

• Scripting skills (e.g., Python, Bash) to automate monitoring, data collection, and reporting.

• SRE Knowledge:

• Strong understanding of SRE principles and best practices, including SLAs, SLOs, error budgets, and incident response.

• Familiarity with tools and practices for observability in distributed systems, microservices, and cloud-based infrastructure.

• Preferred Additional Skills:

• Experience with cloud platforms (AWS, Azure, GCP) and monitoring of cloud-native environments.

• Familiarity with additional observability tools (e.g., Prometheus, Grafana) and infrastructure as code (e.g., Terraform, Ansible).

• ITSM/Incident Management tools knowledge (e.g., ServiceNow, PagerDuty).

• Soft Skills:

• Strong analytical and troubleshooting skills.

• Effective communicator who can convey technical information to stakeholders.

• Collaborative mindset with a proactive approach to system reliability.

For further information please contact Gayathri Ganapathy at [email protected]

Key Skills

Ranked by relevance

c nat ai elasticsearch nagios ios cloud esp ha incident response wan sla das pan ui artificial intelligence infrastructure as code microservices cybersecurity data analysis blockchain prometheus terraform pagerduty ansible grafana python scala excel rust bash git aws gcp spi

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior/Mid-Level Mobile Engineer

2026-06-17

Full-time

Mid-Senior

India

IT Services

Information Technology

Sr. Software Engineer - Full Stack

2026-07-10

Full-time

Mid-Senior

Canada

Technology

Engineering

Data Engineer

2026-07-09

Full-time

Not Applicable

United Arab Emirates

IT Services

Information Technology

🇦🇺

Country Guide

Australia

Skilled migration pathways & strong salaries

Posted: Nov 18, 2024
Type: Full-time
Level: Associate
Location: Sydney
Company: Tech Mahindra

Industries

IT Services IT Consulting

Related Jobs

3 roles aligned with this opportunity

View all jobs

Senior/Mid-Level Mobile Engineer

2026-06-17

Full-time

Mid-Senior

India

IT Services

Information Technology

Sr. Software Engineer - Full Stack

2026-07-10

Full-time

Mid-Senior

Canada

Technology

Engineering

Data Engineer

2026-07-09

Full-time

Not Applicable

United Arab Emirates

IT Services

Information Technology

Site Reliability Engineer

Key Skills

Related Jobs

Senior/Mid-Level Mobile Engineer

Sr. Software Engineer - Full Stack

Data Engineer

Related Jobs

Senior/Mid-Level Mobile Engineer

Sr. Software Engineer - Full Stack

Data Engineer

Cookie Settings