AI Information Extraction - Ask a Data Sheet

This project uses AI to extract information from PDF files and make it searchable. For factual retrieval of information, the system leverages Retrieval Augmented Augmentation (RAG). In doing so, the retrieved information is represented in a JSON structure that forms the factual context for user queries.

Prerequisites and Installing

Clone the git repo
Install at least python 3.12.##
Install the required packages: pip install -r requirements.txt
Set your OpenAI API key in the .env file.
Run the script: python demo.py

Usage

Upload a PDF file.
The AI will extract the data from the PDF and generate a JSON file.
The AI will also generate a JSON schema file.
You can then interact with the document using natural language queries.

Architecture

Notes

This project uses the OpenAI GPT-4 model for information extraction and query processing. Hence, usage of API require token payment (Charges).

Authors

Christoffer Björkskog - Initial work - melonkernel
Christian Möller - chrmolnovia
Lamin Jatta - Lamboyjat

See also the list of people who participated in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
demo_data		demo_data
media		media
pdf_data_extractor		pdf_data_extractor
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Information Extraction - Ask a Data Sheet

Prerequisites and Installing

Usage

Architecture

Notes

Authors

About

Uh oh!

Releases

Packages

Languages

unitycoder/pdf-data-extractor

Folders and files

Latest commit

History

Repository files navigation

AI Information Extraction - Ask a Data Sheet

Prerequisites and Installing

Usage

Architecture

Notes

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages