GitHub - xKDR/LLM_table2db: Converting Indian state budget PDFs into structured CSVs using LLMs with automated and manual quality checks.

LLM_table2db

Collection of state budget original PDFs, along with output of the LLM runs, and prompts.

The goal is to convert these PDFs into accurate transcriptions into a text format (currently CSV) and to translate from Indic characters to English.

DATA: Location of all the source pdf files that are the state budgets
OUT: Location of the parsed CSVs. The output tree here corresponds exactly to the tree in DATA directory, and the naming is consistent
PROMPTS: Language model prompts
SRC: Source code, in particular extraction_pipeline.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DATA/All_States/KA_2020-21		DATA/All_States/KA_2020-21
PROMPTS/15_sr_ka_exp		PROMPTS/15_sr_ka_exp
SRC/15_sr_ka_exp		SRC/15_sr_ka_exp
LICENSE		LICENSE
README.MD		README.MD