Research Scientist – Large Language Models

NuMind (YC S22)France18 days ago

Full-timeRemote Friendly

Track This Job

Add this job to your tracking list to:

Monitor application status and updates
Change status (Applied, Interview, Offer, etc.)
Add personal notes and comments
Set reminders for follow-ups
Track your entire application journey

Save This Job

Add this job to your saved collection to:

Access easily from your saved jobs dashboard
Review job details later without searching again
Compare with other saved opportunities
Keep a collection of interesting positions
Receive notifications about saved jobs before they expire

AI-Powered Job Summary

Get a concise overview of key job requirements, responsibilities, and qualifications in seconds.

Pro Tip: Use this feature to quickly decide if a job matches your skills before reading the full description.

Join our research team to solve information extraction! 🙂

*Recent PhD required*

*You need to be an ML, NLP, and LLM expert*

We are looking for a Research Scientist out of PhD to create LLMs & VLMs such as NuExtract and NuMarkdown to power the https://nuextract.ai/ platform.

Your job will involve creating datasets, training LLMs, performing experiments / ablation studies, and so on. Check the list of typical topics bellow.

You will join a team of brilliant ML scientists supervised by our CEO (https://www.linkedin.com/in/etiennebcp/).

We are a 3-years-old AI startup with 12 employees located in Station F, Paris. We did YCombinator.

We have a hybrid work model -- you should be able to work from our office regularly (at least once a week).

Requirements

You should be out of PhD or post-doc.
You should have an ML/NLP/LLM background.
You should be self-driven, creative, passionate about ML/NLP/LLMs.
You should have both a researcher and a hacker/builder mindset.
You should like to work in a startup environment (fast pace, frequent changes of directions)

Responsibilities

Training task-specific LLMs
Running experiments/ablation studies
Creating datasets
Developing software related to LLMs
Staying up to date with relevant LLM & NLP research

Typical R&D topics we are working on (non exhaustive list):

1. Extraction Confidence

Users of NuExtract.ai want to be able to quickly verify the validity of extracted values in the JSON output. To do so, they need to know which values NuExtract is confident about, and which ones it is not.

We want to figure out how we can get an uncertainty score for the extraction values of NuExtract. This is not trivial due to multiplicity of correct answers and correlations between answers.

2. Extraction Localization

Users of NuExtract.ai want to be able to quickly verify the validity of extracted values. To do so, they need to know where, in the document, the information is coming from (or deduced from).

We want to figure out how to do this.

3. Long Document Extraction

LLMs have a limited context length which limits document size. We want to figure out how NuExtract could extract information from documents much longer than its context length.

4. Reasoning for Structured Extraction

We want to train NuExtract able to reason via private chain of thoughts about its extraction.

5. Extraction Agent

We want to provide a reasoning NuExtract the ability of using tools (e.g. zooming on document or performing a web search) in order to improve extraction quality.

6. Structured Extraction Benchmark

There is no public benchmark for structured extraction. We want to create such benchmark and make it public.

Links:

Platform: https://nuextract.ai/
Blog posts: https://numind.ai/blog
Hugging Face: https://huggingface.co/numind
Github: https://github.com/numindai
Discord: https://discord.com/invite/3tsEtJNCDe
NuNER paper: https://arxiv.org/abs/2402.15343

Key Skills

Ranked by relevance

Ready to apply?

Join NuMind (YC S22) and take your career to the next level!

Application takes less than 5 minutes

Apply