LLM Knowledge Graph Builder

Introduction

This document provides comprehensive documentation for the Neo4j llm-graph-builder Project, a Python web application built with the FastAPI framework. It covers various aspects of the project, including its features, architecture, usage, development, deployment, limitations and known issues.

Features

Upload unstructured data from multiple sources to generate structuted Neo4j knowledge graph.
Extraction of nodes and relations from multiple LLMs(OpenAI GPT-3.5, OpenAI GPT-4, Gemini 1.0-Pro and Diffbot).
View complete graph or only a particular element of graph(ex: Only chunks, only entities, document and entities, etc.)
Generate embedding of chunks created from unstructured content.
Generate k-nearest neighbors graph for similar chunks.
Chat with graph data using chat bot.

Local Setup and Execution

Run Docker Compose to build and start all components:

docker-compose up --build

Alternatively, run specific directories separately:

For frontend

cd frontend
yarn
yarn run dev

For backend

cd backend
python -m venv envName
source envName/bin/activate
pip install -r requirements.txt
uvicorn score:app --reload

Set up environment variables

OPENAI_API_KEY = ""
DIFFBOT_API_KEY = ""
NEO4J_URI = ""
NEO4J_USERNAME = ""
NEO4J_PASSWORD = ""
NEO4J_DATABASE = ""
AWS_ACCESS_KEY_ID =  ""
AWS_SECRET_ACCESS_KEY = ""
EMBEDDING_MODEL = ""
IS_EMBEDDING = "TRUE"
KNN_MIN_SCORE = ""
LANGCHAIN_API_KEY = ""
LANGCHAIN_PROJECT = ""
LANGCHAIN_TRACING_V2 = ""
LANGCHAIN_ENDPOINT = ""
NUMBER_OF_CHUNKS_TO_COMBINE = ""

Architecture

Development

Backend

backend_docs.adoc

Frontend

frontend_docs.adoc

Deployment and Monitoring

The application is deployed on Google Cloud Platform.

To deploy frontend

gcloud run deploy
source location current directory > Frontend
region : 32 [us-central 1]
Allow unauthenticated request : Yes

To deploy backend

gcloud run deploy --set-env-vars "OPENAI_API_KEY = " --set-env-vars "DIFFBOT_API_KEY = " --set-env-vars "NEO4J_URI = " --set-env-vars "NEO4J_PASSWORD = " --set-env-vars "NEO4J_USERNAME = "
source location current directory > Backend
region : 32 [us-central 1]
Allow unauthenticated request : Yes

Langserve is used with FAST API to deploy Langchain runnables and chains as a REST API.
Langsmith is used to monitor and evaluate the application

Developement url

Production url

Appendix

Limitations

Only pdf file uploaded from device or uploaded from s3 bucket or gcs bucket can be processed.
GCS buckets present under 1051503595507@cloudbuild.gserviceaccount.com service account can only be accessed.
Only 1st page of Wikipedia content is processed to generate graphDocument.

Known issues

InactiveRpcError error with Gemini 1.0 Pro - grpc_status:13, grpc_message:"Internal error encountered."
ResourceExhausted error with Gemini 1.5 Pro - 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-1.5-pro
Gemini response validation errors even after making safety_settings parameters to BLOCK_NONE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

project_docs.adoc

project_docs.adoc

LLM Knowledge Graph Builder

Introduction

Features

Local Setup and Execution

Architecture

Development

Backend

Frontend

Deployment and Monitoring

Appendix

Limitations

Known issues

Files

project_docs.adoc

Latest commit

History

project_docs.adoc

File metadata and controls

LLM Knowledge Graph Builder

Introduction

Features

Local Setup and Execution

Architecture

Development

Backend

Frontend

Deployment and Monitoring

Appendix

Limitations

Known issues