Skip to content

unitycoder/pdf-data-extractor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Information Extraction - Ask a Data Sheet

This project uses AI to extract information from PDF files and make it searchable. For factual retrieval of information, the system leverages Retrieval Augmented Augmentation (RAG). In doing so, the retrieved information is represented in a JSON structure that forms the factual context for user queries.

Prerequisites and Installing

  1. Clone the git repo
  2. Install at least python 3.12.##
  3. Install the required packages: pip install -r requirements.txt
  4. Set your OpenAI API key in the .env file.
  5. Run the script: python demo.py

Usage

  1. Upload a PDF file.
  2. The AI will extract the data from the PDF and generate a JSON file.
  3. The AI will also generate a JSON schema file.
  4. You can then interact with the document using natural language queries.

Architecture

screenshot

Notes

This project uses the OpenAI GPT-4 model for information extraction and query processing. Hence, usage of API require token payment (Charges).

Authors

See also the list of people who participated in this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.5%
  • Python 1.5%