-
View all jobs
FactSet Research Systems is an American financial data and software company, providing wide universe of financial data and services with the help of innovative Financial and Statistical data collection.
“We will no longer need to read documents except for fun”. Today, research analysts in the financial domain, have to read long documents to extract data from documents. This is a long and not fun process. With GenAI, data extraction can be facilitated. Can we build a tool that makes document understanding and data extraction easy?
Project Overview:
The internship project involves assisting a team of AI/ML engineers into building a document intelligence tool. You will be involved into few research topics to prove that the approach can meet performance and cost-efficiency.
The project will combine prompt-engineering, LLM selection, RAG. We want to be able to evaluate at each stage that we are not losing performance and that we are saving costs.
The basis task of the tool is: "I want to extract “this concept” from a document.
The solution involves: retrieving the right chunks from the document; building a dynamic prompt; cost optimization; different research studies to prove hypothesis, etc.
The challenge is: how can we build a solution that can scale? How can we be very competitive cost-wise? How can we guarantee extraction quality?
Document intelligence tool description:
We have built a first version of the document intelligence tool. The next steps are to optimize it, perform different research studies and keep adding functionalities to it.
At the current stage, we will be developing new versions of the tool. Each version will have some research phase. We will want to prove a new version is better than the old version. We need someone to help us with the research and ways to automatically evaluate that the new versions are better than previous ones.
Responsibilities:
Standardize ML/AI datasets:
“We will no longer need to read documents except for fun”. Today, research analysts in the financial domain, have to read long documents to extract data from documents. This is a long and not fun process. With GenAI, data extraction can be facilitated. Can we build a tool that makes document understanding and data extraction easy?
- Assistant, can you extract the value of this “concept” from the document for me? Where did you find such information?
Project Overview:
The internship project involves assisting a team of AI/ML engineers into building a document intelligence tool. You will be involved into few research topics to prove that the approach can meet performance and cost-efficiency.
The project will combine prompt-engineering, LLM selection, RAG. We want to be able to evaluate at each stage that we are not losing performance and that we are saving costs.
The basis task of the tool is: "I want to extract “this concept” from a document.
The solution involves: retrieving the right chunks from the document; building a dynamic prompt; cost optimization; different research studies to prove hypothesis, etc.
The challenge is: how can we build a solution that can scale? How can we be very competitive cost-wise? How can we guarantee extraction quality?
Document intelligence tool description:
We have built a first version of the document intelligence tool. The next steps are to optimize it, perform different research studies and keep adding functionalities to it.
At the current stage, we will be developing new versions of the tool. Each version will have some research phase. We will want to prove a new version is better than the old version. We need someone to help us with the research and ways to automatically evaluate that the new versions are better than previous ones.
Responsibilities:
Standardize ML/AI datasets:
- In order to evaluate an AI system, we need to produce validation datasets
- Validation datasets should have standard format
- Validation datasets should be stored in a pre-defined location
- Standardize IO of datasets
- Analyze different evaluation metrics for text generation such as “exact match”, “levenstein score”, “BERT score”
- Define role of “LLM” as a judge
- For instance, prove that using RAG will enhance system performance
- Build a script that will control automatically that system performance did not fall behind given thresholds
- Current student or recent graduate in Computer Science, Information Technology, or a related field.
- Proficiency in Python.
- Can work with jupyter notebooks
- Knowledge on AI/ML
- Good problem-solving skills and an eye for detail.
- Ability to work collaboratively in a team environment.
- Hands-on experience with innovative GenAI use case.
- Mentorship and guidance from experienced developers.
- Exposure to real-world projects.
- Opportunity to develop a comprehensive understanding of AI projects.
- Implication on different AI/ML community events
- FactSet looks to foster a globally inclusive culture. From leadership commitment, to employee led resource groups, FactSet has diversity, equity, and inclusion as a priority. Read more about our priorities here: https://www.factset.com/company/diversity-equity-and-inclusion
- FactSet believe giving back to our communities is part of our culture. From volunteer opportunities to working with non-profit partners, you can read more about our commitments here: https://www.factset.com/company/corporate-responsibility
- Company profits participation
- No or low-cost medical, dental and vision care
- Full and free access to LinkedIn Learning catalog
- Reimbursement for eligible expenses related to AWS certification, financials certifications (CFA, CIPM, CAIA, FRM)
- Employee referral bonuses
- Flexible office work / teleworking
- And more!
Key Skills
Ranked by relevance
c
ha
ai
ui
lua
pan
wan
unity
esp
jupyter notebook
python
aws
pic
isr
aci
nat
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Engineering Internship - Front-End Developer
2024-12-19
Full-time
Internship
France
IT Services
Engineering
View Job Details
Related
Internship: AI Developer - Analytics & Agentic AI - F/M
2026-06-10
Internship
Not Applicable
France
Software Development
Engineering
View Job Details
Related
Software Engineer I
2025-04-26
Full-time
Entry
United Kingdom
IT Services
Engineering
Login to Apply
- Posted
- Dec 16, 2024
- Type
- Full-time
- Level
- Internship
- Location
- Paris
- Company
- FactSet
Industries
IT Services
IT Consulting
Software Development
Financial Services
Categories
Engineering
Information Technology
Related Jobs
3 roles aligned with this opportunity
View Job Details
Related
Engineering Internship - Front-End Developer
2024-12-19
Full-time
Internship
France
IT Services
Engineering
View Job Details
Related
Internship: AI Developer - Analytics & Agentic AI - F/M
2026-06-10
Internship
Not Applicable
France
Software Development
Engineering
View Job Details
Related
Software Engineer I
2025-04-26
Full-time
Entry
United Kingdom
IT Services
Engineering