Skip to content

Converting Indian state budget PDFs into structured CSVs using LLMs with automated and manual quality checks.

License

Notifications You must be signed in to change notification settings

xKDR/LLM_table2db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM_table2db

Collection of state budget original PDFs, along with output of the LLM runs, and prompts.

The goal is to convert these PDFs into accurate transcriptions into a text format (currently CSV) and to translate from Indic characters to English.

Layout of repository

  • DATA: Location of all the source pdf files that are the state budgets
  • OUT: Location of the parsed CSVs. The output tree here corresponds exactly to the tree in DATA directory, and the naming is consistent
  • PROMPTS: Language model prompts
  • SRC: Source code, in particular extraction_pipeline.py

About

Converting Indian state budget PDFs into structured CSVs using LLMs with automated and manual quality checks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published