France, Paris, France
28 days ago
Engineering Internship - Ai Developer

FactSet Research Systems is an American financial data and software company, providing wide universe of financial data and services with the help of innovative Financial and Statistical data collection.

“We will no longer need to read documents except for fun”. Today, research analysts in the financial domain, have to read long documents to extract data from documents. This is a long and not fun process. With GenAI, data extraction can be facilitated. Can we build a tool that makes document understanding and data extraction easy?  

Assistant, can you extract the value of this “concept” from the document for me? Where did you find such information?  

This is today possible. But how can we optimize an AI system to do such a task optimally? How can we make this cheap? How can we guarantee quality? This is the context where we need you.  


Project Overview: 

The internship project involves assisting a team of AI/ML engineers into building a document intelligence tool. You will be involved into few research topics to prove that the approach can meet performance and cost-efficiency.  

The project will combine prompt-engineering, LLM selection, RAG. We want to be able to evaluate at each stage that we are not losing performance and that we are saving costs.  

The basis task of the tool is: "I want to extract “this concept” from a document.  

The solution involves: retrieving the right chunks from the document; building a dynamic prompt; cost optimization; different research studies to prove hypothesis, etc.  

The challenge is: how can we build a solution that can scale? How can we be very competitive cost-wise? How can we guarantee extraction quality?  


Document intelligence tool description: 

We have built a first version of the document intelligence tool. The next steps are to optimize it, perform different research studies and keep adding functionalities to it.  

At the current stage, we will be developing new versions of the tool. Each version will have some research phase. We will want to prove a new version is better than the old version. We need someone to help us with the research and ways to automatically evaluate that the new versions are better than previous ones.  


Responsibilities: 

Standardize ML/AI datasets: 

In order to evaluate an AI system, we need to produce validation datasets  

Validation datasets should have standard format 

Validation datasets should be stored in a pre-defined location  

Standardize IO of datasets  
 

Aggregate evaluation metrics: 

Analyze different evaluation metrics for text generation such as “exact match”, “levenstein score”, “BERT score” 

Define role of “LLM” as a judge 
 

Handle few experiments to prove different hypothesis: 

For instance, prove that using RAG will enhance system performance 


Automate non-regression tests  

Build a script that will control automatically that system performance did not fall behind given thresholds 

 

Qualifications: 

Current student or recent graduate in Computer Science, Information Technology, or a related field. 

Proficiency in Python. 

Can work with jupyter notebooks 

Knowledge on AI/ML  

Good problem-solving skills and an eye for detail. 

Ability to work collaboratively in a team environment. 

 

What We Offer: 

Hands-on experience with innovative GenAI use case. 

Mentorship and guidance from experienced developers. 

Exposure to real-world projects. 

Opportunity to develop a comprehensive understanding of AI projects. 

Implication on different AI/ML community events 

Why Life is Better as a FactSetter:

FactSet looks to foster a globally inclusive culture.  From leadership commitment, to employee led resource groups, FactSet has diversity, equity, and inclusion as a priority.  Read more about our priorities here: https://www.factset.com/company/diversity-equity-and-inclusion

FactSet believe giving back to our communities is part of our culture.  From volunteer opportunities to working with non-profit partners, you can read more about our commitments here: https://www.factset.com/company/corporate-responsibility

Company profits participation

No or low-cost medical, dental and vision care

Full and free access to LinkedIn Learning catalog

Reimbursement for eligible expenses related to AWS certification, financials certifications (CFA, CIPM, CAIA, FRM)

Employee referral bonuses

Flexible office work / teleworking

And more!

At FactSet, we celebrate diversity of thought, experience, and perspective. We are committed to disrupting bias and a transparent hiring process.  All qualified applicants will be considered for employment regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or veteran status. FactSet participates in E-Verify.

Confirm your E-mail: Send Email