Added initial Knowledge Graph support #1801

jaluma · 2024-03-27T20:06:14Z

Knowledge Graph

This PR introduces knowledge graph capabilities.

What is a knowledge graph?

A knowledge graph is a collection of nodes and edges that represent entities or concepts, and their relationships, such as facts, properties, or categories.
It can be used to query or infer factual information about different entities or concepts, based on their node and edge attributes.

Changes Made

Knowledge Graph Support:
- Added support for integrating a knowledge graph into the project. This feature allows for the combination of the knowledge graph with the vector store to leverage different contextual sources.
Neo4j Graph Store Provider:
- Integrated a Neo4j Graph Store provider. A graph database like Neo4j is instrumental in managing complex relationships between data entities. By representing data as nodes and relationships, it enables efficient querying and traversal of interconnected data, making it an ideal choice for implementing a knowledge graph. Additionally, it offers powerful querying capabilities such as pattern matching, making it easier to extract insights from interconnected data.
- During development, encountered issues related to lowercase and string formatting, which have been addressed in this PR.
RDF File Support (Turtle Syntax):
- Implemented support for ingesting RDF files in Turtle syntax into the graph. RDF files represent data in a graph-like structure using subject-predicate-object triples. This allows us to incorporate structured data into the knowledge graph, facilitating richer data representation and enabling advanced querying and analysis.
- The main reason for implementing RDF in the project is to allow processing any kind of linked data on the web locally, following the principles of the project.
- To generate a Wikidata RDF file, a sample Jupyter notebook has been provided: here.
Router Retriever Support (Ensemble retrievers):
- Added support for router retrievers, allowing the simultaneous use of multiple sources with a score ranking mechanism. This enhancement enhances the project's ability to retrieve information from diverse sources and prioritize the most relevant results.
- This feature has been limited to use just one source in this version, it would be nice to parametrize this information in configuration or define a better selection strategy :).

TODO

Ingesting files to Knowledge Graph using ParallelizedIngestComponent, BatchIngestComponent, PipelineIngestComponent
Refactor code to support VectorIndex and KnowledgeGraphIndex
More Graph providers like Nebula.
Allow specific extensions when a provider is enabled e.g. RDF can be used when any GraphStore provider is enabled.
Refactor methods to better identification between vector and graph components.

How to activate it?

In order to select one or the other, set the graphstore.database property in the settings.yaml file to neo4j. It will be need to install extra graph-stores-neo4j.

graphstore:
  database: neo4j

To configure Neo4J connection, set the neo4j object in the settings.yaml.

neo4j:
  url: neo4j://localhost:7687
  username: neo4j
  password: password
  database: neo4j

Run local Neo4J using Docker

To run Neo4j using Docker, you can use the following command:

docker run \
    --restart always \
    --publish=7474:7474 --publish=7687:7687 \
    --env NEO4J_AUTH=neo4j/password \
    -e NEO4J_apoc_export_file_enabled=true \
    -e NEO4J_apoc_import_file_enabled=true \
    -e NEO4J_apoc_import_file_use__neo4j__config=true \
    -e NEO4JLABS_PLUGINS='["apoc"]' \
    -v $PWD/data:/data -v $PWD/plugins:/plugins \
    neo4j:5.18.0

pabloogc · 2024-04-01T09:53:16Z

private_gpt/components/ingest/ingest_component.py

@@ -494,24 +558,28 @@ def get_ingestion_component(
            embed_model=embed_model,
            transformations=transformations,
            count_workers=settings.embedding.count_workers,
+            llm=kwargs.get("llm"),


this feels error prone, can't you use the type directly?

pabloogc · 2024-04-01T10:00:25Z

private_gpt/components/ingest/ingest_helper.py

@@ -48,7 +52,10 @@ def _try_loading_included_file_formats() -> dict[str, type[BaseReader]]:
        ".mbox": MboxReader,
        ".ipynb": IPYNBReader,
    }
-    return default_file_reader_cls
+    optional_file_reader_cls: dict[str, type[BaseReader]] = {


I think you can move it back with the default readers, you are importing it unconditionally anyway

pabloogc · 2024-04-01T10:14:18Z

private_gpt/server/chunks/chunks_service.py

+            graph_store=graph_store_component.graph_store
+            if graph_store_component and graph_store_component.graph_store
+            else None,


you can simplify this with just graph_store_component.graph_store, the component can't be None. The dependency injector would fail before that.

pabloogc · 2024-04-01T10:17:45Z

private_gpt/server/chat/chat_service.py

+            retrievers = [
+                r for r in [vector_index_retriever, graph_knowledge_retrevier] if r
+            ]
+            retriever = RouterRetriever.from_defaults(


past experience with llama-index makes me not trust these from_defaults, can you check the implementation to make sure it's doing sane things only (for example, some defaults try to call OpenAI if you omit one of the parameters)

pabloogc · 2024-04-01T13:34:09Z

private_gpt/settings/settings.py

@@ -389,10 +412,12 @@ class Settings(BaseModel):
    ollama: OllamaSettings
    azopenai: AzureOpenAISettings
    vectorstore: VectorstoreSettings
+    graphstore: GraphStoreSettings | None = None


use a non-nullable type here and add a enabled property instead, makes it easier to configure through env vars that way

pabloogc · 2024-04-01T13:40:30Z

private_gpt/components/ingest/readers/rdfreader.py

@@ -0,0 +1,92 @@
+# mypy: ignore-errors


this is a bit dangerous, what types were giving trouble?

danielgallegovico · 2024-04-01T15:44:00Z

private_gpt/components/ingest/readers/rdfreader.py

+"""Read RDF files.
+
+This module is used to read RDF files.
+It was created by llama-hub but it has not been ported


So, it was ported to llama-index 0.1.0 with fixes, right? This sentence is a little bit confusing...

spsach · 2024-06-05T22:15:50Z

Is the Knowledge Graph functionality working? Has anyone tried it?

jaluma added 12 commits March 26, 2024 23:18

add initial neo4j implementation

d69299b

add initial rdf reader

ace1466

fix rdf reader and initial tests

d7908bc

more fixes

93db414

add graph store to chat, chunks and ingest services

b4b4cbb

allow to save knowledge graph

d863fd4

fix openai default

8acf3db

add graph and router retrievers instead just vector retriever

f80661f

allow to save knowledge graph in SimpleIngestComponent

baf84c2

add missing configurations

18ace2d

add initial knowledge graph documentation

dc776d9

make graph optional

eb571f5

pabloogc reviewed Apr 1, 2024

View reviewed changes

danielgallegovico self-requested a review April 1, 2024 15:22

danielgallegovico reviewed Apr 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added initial Knowledge Graph support #1801

Added initial Knowledge Graph support #1801

jaluma commented Mar 27, 2024

pabloogc Apr 1, 2024

pabloogc Apr 1, 2024

pabloogc Apr 1, 2024

pabloogc Apr 1, 2024

pabloogc Apr 1, 2024

pabloogc Apr 1, 2024

danielgallegovico Apr 1, 2024

spsach commented Jun 5, 2024

Added initial Knowledge Graph support #1801

Are you sure you want to change the base?

Added initial Knowledge Graph support #1801

Conversation

jaluma commented Mar 27, 2024

Knowledge Graph

What is a knowledge graph?

Changes Made

TODO

How to activate it?

Run local Neo4J using Docker

pabloogc Apr 1, 2024

Choose a reason for hiding this comment

pabloogc Apr 1, 2024

Choose a reason for hiding this comment

pabloogc Apr 1, 2024

Choose a reason for hiding this comment

pabloogc Apr 1, 2024

Choose a reason for hiding this comment

pabloogc Apr 1, 2024

Choose a reason for hiding this comment

pabloogc Apr 1, 2024

Choose a reason for hiding this comment

danielgallegovico Apr 1, 2024

Choose a reason for hiding this comment

spsach commented Jun 5, 2024