Data Engineer (5600 USD/Mes) [Remote]

About The Position

Our Client

Our client leverage advanced AI to revolutionize patient recruitment in clinical trials. They utilize a modern, LLM-powered data stack to develop cutting-edge AI solutions, including helping patients find clinical trials they are a match for.

Why our client is hiring for this position/mission

As a Data Engineer, you will be responsible for designing, developing, and maintaining robust data pipelines to support the AI-driven clinical trial solutions. You will work with modern data engineering tools such as dbt, dlt, Apache Airflow, Docker, and Kubernetes, ensuring scalable and reliable data workflows. You will integrate data pipelines with machine learning models and web application backends. Ideally, you've built a data product and you understand the resiliency and flexibility that is required for data pipelines that sit behind an evolving product, as opposed to those used only for analytics. You're excited about the possibilities that LLMs bring and have interest in multi-agent LLM systems.

Team

You will be part of a team of ~5 people, and will be reporting to the Manager.

Responsibilities:

Design, develop, and maintain robust data pipelines
Ensure data pipelines are scalable, reliable, and efficient
Monitor and optimize the performance of data workflows
Work with dlt and dbt for data ingestion and transformation
Use Apache Airflow or Cloud Composer for orchestrating workflows
Implement containerization with Docker and orchestration with Kubernetes
Manage code via GitHub and deploy solutions on Google Cloud Platform (GCP)
Implement Continuous Integration/Continuous Deployment (CI/CD) practices
Utilize Infrastructure-as-Code (IaC) tools to manage and provision infrastructure
Collaborate with cross-functional teams including data scientists, software engineers, and clinical experts
Integrate data pipelines with machine learning models, LLMs, and NLP frameworks
Propose and implement improvements to existing data pipelines and infrastructure

Requirements

3+ years of production experience in data engineering roles
Demonstrated competence with deploying Python data infrastructure on GCP, AWS, or Azure
Experience with Apache Airflow or Cloud Composer for workflow orchestration
Proficiency with Docker
Experience utilizing CI/CD and Infrastructure-as-Code (IaC) tools in a production environment
SQL expertise
Strong understanding of data engineering architecture and data modeling principles
Experience working in a production team utilizing Github for version control
Desire to learn, grow, and sprint with our early stage start up and our ambitious goals!

Preferred:

Hands-on experience with dbt and dlt for data transformation
Experience or strong interest in multi-agent LLM systems
Experience with machine learning and natural language processing
Experience with production data engineering and application environments in GCP
Comfortability with AI-powered software development workflows

Timezone

Working in any US time zone

Recruitment Process

Screener (15-30 mins sync)* Culture fit / technical scenario questions (1 hour sync)* Take home test (async)* Review take home test (30 minutes sync or async)* CEO final interview (15 minutes sync)

Please notice that changes can happen. Steps could be added/skipped depending on the candidate's performance/team's availability.

Requirements

None

Benefits

Vacaciones / Flexibilidad de Trabajo

Trabajo Remoto: Globalmente remoto

7589-3-23102024

Data Engineer (5600 USD/Mes) [Remote]

Key Skills

Related Jobs

Full-Stack Engineer (4400 USD/Mes) [Remote]

Full-Stack Engineer

Data Engineer (SSR) – Python / SQL / Airflow / Cloud (Híbrido)

Related Jobs

Full-Stack Engineer (4400 USD/Mes) [Remote]

Full-Stack Engineer

Data Engineer (SSR) – Python / SQL / Airflow / Cloud (Híbrido)

Cookie Settings