LLMeta is a Python package designed for conducting systematic reviews using large language models in conjunction with Retrieval Augmented Generation (RAG) and Hypothetical Document Embeddings (HyDE) techniques.
To install LLMeta, you can use pip:
pip install LLMetaThis function creates the necessary directories for the LLMeta project. It initializes folders such as markdown_file, pdf_folder, raw_output, temp, and vectorstore to organize project files.
This function reads a markdown file, splits it into sections based on a delimiter (---\n), and returns a list of these sections. It appends part of the next section's beginning to each section to ensure context continuity.
This function saves text chunks into a vectorstore using the FAISS library for efficient retrieval. It uses OpenAI embeddings to convert text chunks into vectors.
This function processes all markdown files in a specified directory, splits them into sections, and saves these sections into a vectorstore for later retrieval.
This function generates a hypothetical text based on given variables, their definitions, and values, within the context of a provided title and abstract. It uses OpenAI's language model to create plausible paragraphs.
This function retrieves relevant documents from a vectorstore based on a hypothetical text. It uses FAISS for document retrieval and provides the most relevant passages from the original documents.
This function generates a judgement or extraction based on the retrieved relevant documents. It evaluates if the text mentions a specific variable and matches its values, outputting the result in a JSON format.
This function processes a single row of data, generating hypothetical texts, retrieving relevant documents, and generating judgements. It saves the results, including extracted values and probability scores, to text files.
This function processes multiple papers concurrently, using process_row for each paper. It reads variable lists and merged data, then applies the processing to each paper's data.
This function formats the extracted data and probabilities into a structured CSV file. It reads answer, source, and probability files, processes them, and merges the results with the original metadata and variable definitions.
