Flipkart
Site Reliability Engineer
FlipkartIndia10 hours ago
Full-timeInformation Technology

Hiring Site Reliability Engineers


Exp : 2.5 +years [Excluding internship]

Location : Bangalore


Apply Here : https://lnkd.in/gH7xUGUH


The engineer will work in the Reliability and Productivity Engineering team and is responsible for building industry standard large scale platforms to be utilised across FK that helps to significantly improve the reliability of systems and bring efficiency in engineering practices that significantly boost system's reliability.


About the Role


As a Software Development Engineer II (SRE II), you will be the primary reliability owner for FDP's Batch Processing Ecosystem. This is a hands-on role requiring deep technical ownership over the infrastructure that runs our most intensive data transformations and analytics jobs.

Your core responsibility is to ensure maximum uptime, optimal performance, and automated operations across the complete data stack. You will architect, implement, and automate the lifecycle management of our foundational components, including:

  • Hadoop Infrastructure setup, tuning, and ongoing reliability.
  • Management and optimization of large Dataproc Clusters in our public cloud environment.
  • Designing and maintaining robust Kafka pipelines for fault-tolerant data ingestion.
  • Defining and enforcing best practices for high-performance table and storage formats (e.g., Iceberg, Delta Lake, Parquet, ORC, Hive).

You will translate application reliability requirements into infrastructure code, eliminate operational toil through automation, and drive the adoption of new technologies to improve data latency and throughput.


About the team


About Flipkart’s Reliability & Productivity Charter

The Functional SRE team for the Flipkart Data Platform (FDP) acts as the specialized reliability partner ensuring the data foundation of Flipkart operates with industry-leading availability and performance. The FDP team manages a petabyte-scale, mission-critical data ecosystem that powers everything from core business intelligence and financial reporting to cutting-edge machine learning models.

We are directly embedded with the FDP Engineering team, applying core SRE principles to solve unique challenges inherent in massive data pipelines. Our focus is the resilience and scalability of the data lifecycle: ingestion, batch processing, storage, and access. Join us to engineer for stability across complex distributed systems like Hadoop, Dataproc (on GCP), and high-throughput Kafka clusters, guaranteeing data integrity for millions of daily transactions.

Key Skills

Ranked by relevance