# Property Graph Construction with Predefined Schemas
In this notebook, we walk through using Neo4j, Ollama and Huggingface to build a property graph.

Specifically, we will be using the `SchemaLLMPathExtractor` which allows us to specify an exact schema containing possible entity types, relation types, and defining how they can be connected together. 

This is useful for when you have a specific graph you want to build, and want to limit what the LLM is predicting.

In [None]:
%pip install llama-index
%pip install llama-index-llms-ollama
%pip install llama-index-embeddings-huggingface
%pip install llama-index-graph-stores-neo4j

## Load Data

First, lets download some sample data to play with.

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [None]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

  from .autonotebook import tqdm as notebook_tqdm


## Graph Construction

To construct our graph, we are going to take advantage of the `SchemaLLMPathExtractor` to construct our graph.

Given some schema for a graph, we can extract entities and relations that follow this schema, rather than letting the LLM decide entities and relations at random.

In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
from typing import Literal
from llama_index.llms.ollama import Ollama
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

# best practice to use upper-case
entities = Literal["PERSON", "PLACE", "ORGANIZATION"]
relations = Literal["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"]

# define which entities can have which relations
validation_schema = {
    "PERSON": ["HAS", "PART_OF", "WORKED_ON", "WORKED_WITH", "WORKED_AT"],
    "PLACE": ["HAS", "PART_OF", "WORKED_AT"],
    "ORGANIZATION": ["HAS", "PART_OF", "WORKED_WITH"],
}

kg_extractor = SchemaLLMPathExtractor(
    llm=Ollama(model="llama3", json_mode=True, request_timeout=3600),
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    # if false, allows for values outside of the schema
    # useful for using the schema as a suggestion
    strict=True,
)

To launch Neo4j locally, first ensure you have docker installed. Then, you can launch the database with the following docker command

```bash
docker run \
    -p 7474:7474 -p 7687:7687 \
    -v $PWD/data:/data -v $PWD/plugins:/plugins \
    --name neo4j-apoc \
    -e NEO4J_apoc_export_file_enabled=true \
    -e NEO4J_apoc_import_file_enabled=true \
    -e NEO4J_apoc_import_file_use__neo4j__config=true \
    -e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
    neo4j:latest
```

From here, you can open the db at [http://localhost:7474/](http://localhost:7474/). On this page, you will be asked to sign in. Use the default username/password of `neo4j` and `neo4j`.

Once you login for the first time, you will be asked to change the password.

After this, you are ready to create your first property graph!

In [None]:
from llama_index.graph_stores.neo4j import Neo4jPGStore

graph_store = Neo4jPGStore(
    username="neo4j",
    password="<password>",
    url="bolt://localhost:7687",
)

**NOTE:** Using a local model will be slower when extracting compared to API based models. Local models (like Ollama) are typically limited to sequential processing. Expect this to take about 10 minutes on an M2 Max.

In [None]:
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[kg_extractor],
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    property_graph_store=graph_store,
)

If we inspect the graph created, we can see that it only includes the relations and entity types that we defined!

![local graph](./local_kg.png)

For information on all `kg_extractors`, see [the documentation](../../module_guides/indexing/lpg_index_guide.md#construction).

## Querying

Now that our graph is created, we can query it. 

As is the theme with this notebook, we will be using a lower-level API and constructing all our retrievers ourselves!

In [None]:
from llama_index.core.indices.property_graph import (
    LLMSynonymRetriever,
    VectorContextRetriever,
)


llm_synonym = LLMSynonymRetriever(
    index.property_graph_store,
    llm=Ollama(model="llama3", request_timeout=3600),
    include_text=False,
)
vector_context = VectorContextRetriever(
    index.property_graph_store,
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    include_text=False,
)

In [None]:
retriever = index.as_retriever(
    sub_retrievers=[
        llm_synonym,
        vector_context,
    ]
)

In [None]:
nodes = retriever.retrieve("What happened at Interleaf?")

for node in nodes:
    print(node.text)

Paul Graham -> WORKED_AT -> Interleaf
Paul Graham -> WORKED_AT -> Yahoo
Paul Graham -> WORKED_AT -> Cambridge
Tom Cheatham -> WORKED_AT -> Cambridge
Kevin Hale -> WORKED_AT -> Viaweb
Paul Graham -> WORKED_AT -> Viaweb
Paul Graham -> WORKED_ON -> Viaweb
Paul Graham -> PART_OF -> Viaweb


We can also create a query engine with similar syntax.

In [None]:
query_engine = index.as_query_engine(
    sub_retrievers=[
        llm_synonym,
        vector_context,
    ],
    llm=Ollama(model="llama3", request_timeout=3600),
)

response = query_engine.query("What happened at Interleaf?")

print(str(response))

Paul Graham worked at Interleaf.


For more info on all retrievers, see the [complete guide](../../module_guides/indexing/lpg_index_guide.md#retrieval-and-querying).