# NebulaGraph Property Graph Index

NebulaGraph is an open-source distributed graph database built for super large-scale graphs with milliseconds of latency.

If you already have an existing graph, please skip to the end of this notebook.

In [None]:
%pip install llama-index llama-index-graph-stores-nebula jupyter-nebulagraph

## Docker Setup

To launch NebulaGraph locally, first ensure you have docker installed. Then, you can launch the database with the following docker command.

```bash
mkdir nebula-docker-compose
cd nebula-docker-compose
curl --output docker-compose.yaml https://raw.githubusercontent.com/vesoft-inc/nebula-docker-compose/master/docker-compose-lite.yaml
docker compose up 
```

After this, you are ready to create your first property graph!

> Other options/details for deploying NebulaGraph can be found in the [docs](https://docs.nebula-graph.io/):
>
> - [ad-hoc cluster in Google Colab](https://docs.nebula-graph.io/master/4.deployment-and-installation/2.compile-and-install-nebula-graph/8.deploy-nebula-graph-with-lite/).
> - [Docker Desktop Extension](https://docs.nebula-graph.io/master/2.quick-start/1.quick-start-workflow/).


In [1]:
# load NebulaGraph Jupyter extension to enable %ngql magic
%load_ext ngql
# connect to NebulaGraph service
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula
# create a graph space(think of a Database Instance) named: llamaindex_nebula_property_graph
%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));

In [4]:
# use the graph space, which is similar to "use database" in MySQL
# The space was created in async way, so we need to wait for a while before using it, retry it if failed
%ngql USE llamaindex_nebula_property_graph;

## Env Setup

We need just a few environment setups to get started.

In [4]:
import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [5]:
import nest_asyncio

nest_asyncio.apply()

In [6]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

We choose using gpt-4o and local embedding model intfloat/multilingual-e5-large . You can change to what you like, by editing the following lines:

In [None]:
%pip install llama-index-embeddings-huggingface

In [None]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = OpenAI(model="gpt-4o", temperature=0.3)
Settings.embed_model = HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-large")
# Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")

## Index Construction

Prepare property graph store

In [8]:
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore

graph_store = NebulaPropertyGraphStore(space="llamaindex_nebula_property_graph", overwrite=True)

And vector store:

In [9]:
from llama_index.core.vector_stores.simple import SimpleVectorStore
vec_store = SimpleVectorStore()

Finally, build the index!

In [10]:
from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.core.storage.storage_context import StorageContext
from llama_index.llms.openai import OpenAI

index = PropertyGraphIndex.from_documents(
    documents,
    property_graph_store=graph_store,
    vector_store=vec_store,
    show_progress=True,
)

index.storage_context.vector_store.persist("./data/nebula_vec_store.json")

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting paths from text: 100%|█████████████████████████████████████████████████| 22/22 [03:10<00:00,  8.67s/it]
Extracting implicit paths: 100%|████████████████████████████████████████████████| 22/22 [00:00<00:00, 9564.13it/s]
Generating embeddings: 100%|████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.90it/s]
Generating embeddings: 100%|██████████████████████████████████████████████████████| 44/44 [00:06<00:00,  7.09it/s]


Now that the graph is created, we can explore it with [jupyter-nebulagraph](https://github.com/wey-gu/jupyter_nebulagraph)

In [11]:
%ngql SHOW TAGS

Unnamed: 0,Name
0,Chunk__
1,Entity__
2,Node__
3,Props__


In [12]:
%ngql SHOW EDGES

Unnamed: 0,Name
0,Relation__
1,__meta__node_label__
2,__meta__rel_label__


In [None]:
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN v.Entity__.name AS src, r.label AS relation, t.Entity__.name AS dest LIMIT 15;

In [None]:
%ngql MATCH p=(v:Entity__)-[r]->(t:Entity__) RETURN p LIMIT 10;

In [None]:
%ng_draw

## Querying and Retrieval

In [16]:
retriever = index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)

Interleaf -> Added -> Scripting language
Interleaf -> Made -> Software for creating documents
Interleaf -> Had done -> Something pretty bold
Viaweb -> Had -> Code editor
Viaweb -> Called -> Company
Viaweb -> Started -> I
Viaweb -> Started -> We
Viaweb -> Seemed -> Lame
Company called interleaf -> Got job -> I
Viaweb stock -> Was -> Valuable


In [17]:
query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

At Interleaf, the company made software for creating documents and took a bold step by adding a scripting language inspired by Emacs, which was a dialect of Lisp. The company sought a Lisp hacker to write in this language. Despite the innovative approach, the scripting language was a thin layer over a predominantly C-based software, which the employee found challenging due to a lack of interest in learning C. The job at Interleaf provided significant financial benefits and several lessons about the tech industry, including the importance of being the entry-level option in the market.

At Viaweb, the company had a code editor for users to define their own page styles, which involved editing Lisp expressions. Viaweb was eventually bought by Yahoo in 1998, which was a relief for the founders as it marked the end of their financial struggles. The Viaweb stock was valuable, but the founders experienced significant stress and near-death experiences during the company's life. After the acquis

## Loading from an existing Graph

If you have an existing graph, we can connect to and use it!

In [18]:
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore
graph_store = NebulaPropertyGraphStore(space="llamaindex_nebula_property_graph")

from llama_index.core.vector_stores.simple import SimpleVectorStore
vec_store = SimpleVectorStore.from_persist_path("./data/nebula_vec_store.json")

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    vector_store=vec_store,
)

From here, we can still insert more documents!

In [19]:
from llama_index.core import Document

document = Document(text="LlamaIndex is great!")

index.insert(document)

In [20]:
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")

print(nodes[0].text)

Llamaindex -> Is -> Great
