Skip to content

Files

Latest commit

 

History

History
126 lines (85 loc) · 3.3 KB

project_docs.adoc

File metadata and controls

126 lines (85 loc) · 3.3 KB

LLM Knowledge Graph Builder

Introduction

This document provides comprehensive documentation for the Neo4j llm-graph-builder Project, a Python web application built with the FastAPI framework. It covers various aspects of the project, including its features, architecture, usage, development, deployment, limitations and known issues.

Features

  • Upload unstructured data from multiple sources to generate structuted Neo4j knowledge graph.

  • Extraction of nodes and relations from multiple LLMs(OpenAI GPT-3.5, OpenAI GPT-4, Gemini 1.0-Pro and Diffbot).

  • View complete graph or only a particular element of graph(ex: Only chunks, only entities, document and entities, etc.)

  • Generate embedding of chunks created from unstructured content.

  • Generate k-nearest neighbors graph for similar chunks.

  • Chat with graph data using chat bot.

Local Setup and Execution

Run Docker Compose to build and start all components:

docker-compose up --build

Alternatively, run specific directories separately:

  • For frontend

cd frontend
yarn
yarn run dev
  • For backend

cd backend
python -m venv envName
source envName/bin/activate
pip install -r requirements.txt
uvicorn score:app --reload

Set up environment variables

OPENAI_API_KEY = ""
DIFFBOT_API_KEY = ""
NEO4J_URI = ""
NEO4J_USERNAME = ""
NEO4J_PASSWORD = ""
NEO4J_DATABASE = ""
AWS_ACCESS_KEY_ID =  ""
AWS_SECRET_ACCESS_KEY = ""
EMBEDDING_MODEL = ""
IS_EMBEDDING = "TRUE"
KNN_MIN_SCORE = ""
LANGCHAIN_API_KEY = ""
LANGCHAIN_PROJECT = ""
LANGCHAIN_TRACING_V2 = ""
LANGCHAIN_ENDPOINT = ""
NUMBER_OF_CHUNKS_TO_COMBINE = ""

Architecture

Archirecture diagram

Development

Deployment and Monitoring

  • The application is deployed on Google Cloud Platform.

    To deploy frontend
    gcloud run deploy
    source location current directory > Frontend
    region : 32 [us-central 1]
    Allow unauthenticated request : Yes
    To deploy backend
    gcloud run deploy --set-env-vars "OPENAI_API_KEY = " --set-env-vars "DIFFBOT_API_KEY = " --set-env-vars "NEO4J_URI = " --set-env-vars "NEO4J_PASSWORD = " --set-env-vars "NEO4J_USERNAME = "
    source location current directory > Backend
    region : 32 [us-central 1]
    Allow unauthenticated request : Yes
  • Langserve is used with FAST API to deploy Langchain runnables and chains as a REST API.

  • Langsmith is used to monitor and evaluate the application

Developement url

Production url

Appendix

Limitations

  • Only pdf file uploaded from device or uploaded from s3 bucket or gcs bucket can be processed.

  • GCS buckets present under 1051503595507@cloudbuild.gserviceaccount.com service account can only be accessed.

  • Only 1st page of Wikipedia content is processed to generate graphDocument.

Known issues

  • InactiveRpcError error with Gemini 1.0 Pro - grpc_status:13, grpc_message:"Internal error encountered."

  • ResourceExhausted error with Gemini 1.5 Pro - 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-1.5-pro

  • Gemini response validation errors even after making safety_settings parameters to BLOCK_NONE.