Data Engineer - Pyspark

Virtusa

United Arab Emirates · Full-time · Associate

About the RoleWe are seeking a highly skilled Data Engineer with deep expertise in PySpark and the Cloudera Data Platform (CDP) to join our data engineering team. As a Data Engineer, you will be responsible for designing, developing, and maintaining scalable data pipelines that ensure high data quality and availability across the organization. This role requires a strong background in big data ecosystems, cloud-native tools, and advanced data processing techniques.The ideal candidate has hands-on experience with data ingestion, transformation, and optimization on the Cloudera Data Platform, along with a proven track record of implementing data engineering best practices. You will work closely with other data engineers to build solutions that drive impactful business insights.ResponsibilitiesData Pipeline Development: Design, develop, and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy.Data Ingestion: Implement and manage data ingestion processes from a variety of sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP.Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets into meaningful formats that support analytical needs and business requirements.Performance Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL processes.Data Quality and Validation: Implement data quality checks, monitoring, and validation routines to ensure data accuracy and reliability throughout the pipeline.Automation and Orchestration: Automate data workflows using tools like Apache Oozie, Airflow, or similar orchestration tools within the Cloudera ecosystem.Education and ExperienceBachelors or Masters degree in Computer Science, Data Engineering, Information Systems, or a related field.3+ years of experience as a Data Engineer, with a strong focus on PySpark and the Cloudera Data Platform.Technical SkillsPySpark: Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques.Cloudera Data Platform: Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase.Data Warehousing: Knowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala).Big Data Technologies: Familiarity with Hadoop, Kafka, and other distributed computing tools.Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar orchestration frameworks.Scripting and Automation: Strong scripting skills in Linux.

Key Skills

Ranked by relevance

c cloud spark ai ha etl ui data warehousing big data apache scala esp distributed computing data warehouse hadoop kafka linux hbase sql oop nat ats ux kf

Related Jobs

3 roles aligned with this opportunity

View all jobs

DotNet Developer with Advent (wealth management)

2026-07-04

Full-time

Associate

United Arab Emirates

IT Services

Engineering

Senior Nodejs Engineer

2026-07-09

Full-time

Not Applicable

Argentina

IT Services

Information Technology

Member of Technical Staff – AI Inference platform, features

2026-07-09

Full-time

Associate

Switzerland

IT Services

Engineering

🇦🇪

Country Guide

United Arab Emirates

Tax-friendly regional tech hub

Posted: Nov 28, 2024
Type: Full-time
Level: Associate
Location: Dubai
Company: Virtusa

Industries

IT Services IT Consulting

Related Jobs

3 roles aligned with this opportunity

View all jobs

DotNet Developer with Advent (wealth management)

2026-07-04

Full-time

Associate

United Arab Emirates

IT Services

Engineering

Senior Nodejs Engineer

2026-07-09

Full-time

Not Applicable

Argentina

IT Services

Information Technology

Member of Technical Staff – AI Inference platform, features

2026-07-09

Full-time

Associate

Switzerland

IT Services

Engineering

Data Engineer - Pyspark

Key Skills

Related Jobs

DotNet Developer with Advent (wealth management)

Senior Nodejs Engineer

Member of Technical Staff – AI Inference platform, features

Related Jobs

DotNet Developer with Advent (wealth management)

Senior Nodejs Engineer

Member of Technical Staff – AI Inference platform, features

Cookie Settings