Skip to content

yebarryallen/LLMeta

Repository files navigation

LLMeta

License: MIT PyPI version Website Twitter Follow

img.png

LLMeta is a Python package designed for conducting systematic reviews using large language models in conjunction with Retrieval Augmented Generation (RAG) and Hypothetical Document Embeddings (HyDE) techniques.

Installation

To install LLMeta, you can use pip:

pip install LLMeta

Table of Contents

LLMeta Python Package Documentation

Setup

setup

This function creates the necessary directories for the LLMeta project. It initializes folders such as markdown_file, pdf_folder, raw_output, temp, and vectorstore to organize project files.

Markdown File Processing

split_markdown_file

This function reads a markdown file, splits it into sections based on a delimiter (---\n), and returns a list of these sections. It appends part of the next section's beginning to each section to ensure context continuity.

save_vectorstore

This function saves text chunks into a vectorstore using the FAISS library for efficient retrieval. It uses OpenAI embeddings to convert text chunks into vectors.

markdown_to_vectorstore

This function processes all markdown files in a specified directory, splits them into sections, and saves these sections into a vectorstore for later retrieval.

Text Generation

generate_hypothetical_text

This function generates a hypothetical text based on given variables, their definitions, and values, within the context of a provided title and abstract. It uses OpenAI's language model to create plausible paragraphs.

get_relate_doc

This function retrieves relevant documents from a vectorstore based on a hypothetical text. It uses FAISS for document retrieval and provides the most relevant passages from the original documents.

generate_judgement

This function generates a judgement or extraction based on the retrieved relevant documents. It evaluates if the text mentions a specific variable and matches its values, outputting the result in a JSON format.

Paper Processing

process_row

This function processes a single row of data, generating hypothetical texts, retrieving relevant documents, and generating judgements. It saves the results, including extracted values and probability scores, to text files.

process_papers

This function processes multiple papers concurrently, using process_row for each paper. It reads variable lists and merged data, then applies the processing to each paper's data.

Output Formatting

formated_output

This function formats the extracted data and probabilities into a structured CSV file. It reads answer, source, and probability files, processes them, and merges the results with the original metadata and variable definitions.

Citation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages