-
Tata Technologies

Senior DevOps / SRE Engineer

Tata Technologies
Sweden · Full-time · Mid-Senior

At Tata Technologies we make product development dreams a reality by designing, engineering and validating the products of tomorrow for the world’s leading manufacturers. Due to our continued growth, we are now recruiting for a Senior DevOps / SRE Engineer, to strengthen our team in Gothenburg.


Scope of role

We are seeking a Senior DevOps / SRE Technical Engineer to serve as a key technical owner for cloud infrastructure, observability, reliability engineering, and cloud cost optimization across AWS and GCP.


This role carries clear accountability and measurable outcomes in the following areas:

1. End-to-end observability (design → implementation → continuous improvement)

2. Systematic cloud cost optimization across AWS & GCP (FinOps)

3. Production reliability governance and risk reduction

4. Root cause analysis (RCA) and systemic improvement of major incidents


You will be expected not only to design but also to deliver, operate, and be assessed against concrete results.


Responsibilities

1) End-to-End Observability

What you will own:

Independently design and implement a comprehensive end-to-end observability system covering:

• Infrastructure (AWS/GCP, Kubernetes, network, storage)

• Platform (message queues, databases, caches, API gateways)

• Application layer (microservices, critical business flows)

• Business layer (key business metrics)


You will be expected to produce:

1.Unified Observability Architecture Document

• Overall architecture diagram (Metrics + Logs + Traces)

• Data flow diagram (collection → processing → storage → visualization)

• Tooling selection and justification (e.g., Prometheus, Datadog, OpenTelemetry)


2.Standardized Observability Data Model

• Unified metrics naming conventions

• Standardized tracing model (Trace ID, Span, sampling strategy)

• Structured logging standard (JSON schema)


3.Operational Dashboards

• Infrastructure health dashboard

• Platform services health dashboard

• Business API check of KPI dashboard


4.Alerting System

• Defined P0/P1/P2 alert levels

• Alert noise reduction strategy

• Automated alert routing by team/service 5.SLI / SLO / SLA Framework

• At least 5 critical business SLOs defined and tracked

• Clear error budget policy


2) Cloud Cost Optimization – FinOps (Core Requirement)

What you will own:

Lead systematic cost optimization across AWS and GCP without compromising performance, reliability, or user experience.


You will implement:

1.Unified Cost Visibility System

• Combined AWS + GCP cost dashboards

• Cost breakdown by: Team/Product/Service/Environment (Dev/Test/Stage/Prod)


2.Actionable Cost Optimization Plan

• Compute (EKS/GKE, EC2/Compute Engine, Serverless)

• Storage (S3/GCS tiering, lifecycle policies)

• Databases (RDS/Cloud SQL sizing, connection pooling, caching)

• Network costs (egress, cross-region traffic)


3.Cost Shift-Left Mechanisms

• Cost checks integrated into CI/CD

• Mandatory resource ownership and budget limits

• Quarterly cost reviews


3) Production Reliability & Incident Governance

What you will own: Move from reactive “firefighting” to systematic reliability engineering.


Required Deliverables:

1.Incident Management Framework

• Standard P0/P1 incident response process

• RCA template and follow-up tracking mechanism


2.Reliability Governance Framework

• Error budget policy

• Standardized canary/gradual rollout process

• Automated rollback mechanisms


3.Risk Register

• Identified systemic risks and technical debt

• Prioritized remediation roadmap


4) Kubernetes & Multi-Cloud Platform Optimization

What you will deliver:

• Optimize EKS/GKE cluster architecture

• Improve stability (reduce OOMs, node instability, network issues)

• Improve resource utilization


Knowledge/Experience

Experience

• 5+ years of DevOps / SRE / Cloud Platform experience

• At least 3 years in a Staff/Principal or Tech Lead role

• Experience operating large-scale distributed systems in production


Cloud Expertise

• Deep expertise in both AWS and GCP

• Ability to design cross-cloud architectures

• Strong experience with Terraform / Pulumi / CDK


Observability Expertise

• Proven experience designing and implementing observability from scratch

• Deep hands-on experience with Prometheus/Grafana/Loki/Elastic/Kibana


Kubernetes

• Deep understanding of Kubernetes internals (Scheduler, Controllers, etcd, CNI, CRI)

• Experience managing large-scale production clusters


Programming

• Proficiency in Java or Python/Go


Strong Plus

• Google SRE background or deep SRE practice

• Experience with Chaos Engineering

• Proven FinOps success cases

• Knowledge of eBPF and performance profiling

• Open-source contributions

• Experience designing multi-cloud disaster recovery (Active-Active or Active-Passive)


If you are passionate about bringing innovation to the projects, you work on then we would love to hear from you.


Tata Technologies: Engineering a better world.


Tata Technologies would like to thank all applicants for their interest; each application will be reviewed against the set criteria for the role. We would like to advise that only candidates under consideration will be contacted. If you do not hear from us within 10 working days following the closing date it will mean that unfortunately your application has not been successful. We will however retain your details for any suitable future opportunities.

Key Skills

Ranked by relevance

cloud aws gcp kubernetes storage devops incident response message queues microservices serverless prometheus terraform datadog java cicd sql sla
Login to Apply
Posted
Feb 17, 2026
Type
Full-time
Level
Mid-Senior
Location
Gothenburg

Industries

Motor Vehicle Manufacturing Industrial Machinery Manufacturing

Categories

Information Technology

Related Jobs

3 roles aligned with this opportunity

View all jobs
View Job Details
James Hardie
Related

AI Solution Engineer

2026-05-23

Full-time
Not Applicable
United States
Wholesale Building Materials
Engineering
View Job Details
Addition
Related

Full Stack Engineer

2026-05-24

Full-time
Mid-Senior
United Kingdom
Motor Vehicle Manufacturing
Information Technology
View Job Details
Accenture DACH
Related

Senior Data & Machine Learning Engineer (all genders)

2026-05-21

Full-time
Not Applicable
Austria
IT Services
Engineering