<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/query_engine/knowledge_graph_query_engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Knowledge Graph Query Engine

Creating a Knowledge Graph usually involves specialized and complex tasks. However, by utilizing the Llama Index (LLM), the KnowledgeGraphIndex, and the GraphStore, we can facilitate the creation of a relatively effective Knowledge Graph from any data source supported by [Llama Hub](https://llamahub.ai/).

Furthermore, querying a Knowledge Graph often requires domain-specific knowledge related to the storage system, such as Cypher. But, with the assistance of the LLM and the LlamaIndex KnowledgeGraphQueryEngine, this can be accomplished using Natural Language!

In this demonstration, we will guide you through the steps to:

- Extract and Set Up a Knowledge Graph using the Llama Index
- Query a Knowledge Graph using Cypher
- Query a Knowledge Graph using Natural Language

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

### 패키지 설치

In [None]:
# %pip install llama-index-graph-stores-nebula
# %pip install llama-index-llms-openai

In [None]:
# !pip install llama-index python-dotenv

In [None]:
# %pip install -q ipython-ngql nebula3-python

### NebulaGraph console

- llamaindex space 만들기

- TextNode 구조


    text=entry["content"],

    metadata={
                    "law_index": entry["index"],
                    "subtitle": entry["subtitle"],
                    "document_title": entry["metadata"]["document_title"],
                    "created_date": entry["metadata"]["date"],
                    "revise_info": entry["metadata"]["revise_info"],
                    "source": entry["metadata"]["source"],
                    "title_doc": entry["metadata"]["title"]["doc"],
                    "title_chapter": entry["metadata"]["title"]["chapter"],
                    "title_section": entry["metadata"]["title"]["section"],
                    "title_subsection": entry["metadata"]["title"]["subsection"],
                }

In [2]:
# 터미널에서 실행
# cd /mnt/c/Users/Shic/legal_graph/nebula-console
# ./nebula-console -addr=192.168.176.1 -port 9669 -u root -p 001101

# If not, create it with the following commands from NebulaGraph's console:
# CREATE SPACE llamaindex(vid_type=FIXED_STRING(2048), partition_num=1, replica_factor=1);
# USE llamaindex;
# CREATE TAG law(law_index string, name string, document_title string, created_date string, revise_info string, source string, title_doc string, title_chapter string, title_section string, title_subsection string);
# CREATE EDGE REFERS_TO(relation string);
# CREATE TAG INDEX law_index ON law(law_index(256));

> https://docs.nebula-graph.io/3.0.0/3.ngql-guide/9.space-statements/1.create-space/

> https://docs.nebula-graph.io/3.0.0/3.ngql-guide/10.tag-statements/1.create-tag/

### OpenAI

In [3]:
# For OpenAI

import os
from dotenv import load_dotenv
load_dotenv(verbose=True)

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output


# define LLM
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0, model="gpt-4o")
Settings.chunk_size = 512

## Prepare for NebulaGraph

Before next step to creating the Knowledge Graph, let's ensure we have a running NebulaGraph with defined data schema.

In [2]:
# %%bash
# sudo curl -fsSL nebula-up.siwei.io/install.sh | bash

│ 🌏 You can access it from browser:     │
│      http://127.0.0.1:7001             │
│      http://<other_interface>:7001

In [4]:
import os
os.environ["NEBULA_USER"] = os.getenv("NEBULA_USER")
os.environ["NEBULA_PASSWORD"] = os.getenv("NEBULA_PASSWORD")
os.environ["NEBULA_ADDRESS"] = os.getenv("NEBULA_ADDRESS")

space_name = "llamaindex"
edge_types, rel_prop_names = ["REFERS_TO"], [
    "relation"
]  # default, could be omit if create from an empty kg
tags = ["law"]  # default, could be omit if create from an empty kg

Prepare for StorageContext with graph_store as NebulaGraphStore

In [5]:
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore # Neo4j와의 연동 가능할 듯

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

In [6]:
storage_context

StorageContext(docstore=<llama_index.core.storage.docstore.simple_docstore.SimpleDocumentStore object at 0x7fdda2454850>, index_store=<llama_index.core.storage.index_store.simple_index_store.SimpleIndexStore object at 0x7fdda22e5890>, vector_stores={'default': SimpleVectorStore(stores_text=False, is_embedding_query=True, data=SimpleVectorStoreData(embedding_dict={}, text_id_to_ref_doc_id={}, metadata_dict={})), 'image': SimpleVectorStore(stores_text=False, is_embedding_query=True, data=SimpleVectorStoreData(embedding_dict={}, text_id_to_ref_doc_id={}, metadata_dict={}))}, graph_store=<llama_index.graph_stores.nebula.nebula_graph_store.NebulaGraphStore object at 0x7fdda3602390>, property_graph_store=None)

## (Optional)Build the Knowledge Graph with LlamaIndex

With the help of Llama Index and LLM defined, we could build Knowledge Graph from given documents.

If we have a Knowledge Graph on NebulaGraphStore already, this step could be skipped

### Step 1, load data from Wikipedia for "Guardians of the Galaxy Vol. 3"

In [None]:
%pip install -q llama-index-readers-json
%pip install -q llama-index-readers-hwp

In [7]:
import json
from llama_index.core import Document

# JSON 파일을 로드하는 함수
def load_json_as_documents(input_file):
    # 파일을 열고 JSON 데이터를 파싱
    with open(input_file, 'r', encoding='utf-8') as f:
        json_data = json.load(f)

    # # schema에 맞추어 json flatten
    # flattened_json = [flatten_json(item) for item in json_data]

    documents = []
    
    # JSON 리스트의 각 요소를 Document로 변환하여 리스트에 추가
    for entry in json_data:
        # 각 요소를 Document 객체로 변환
        doc = Document(
            text=entry["content"],
            metadata={
                "law_index": entry["index"],
                "name": entry["subtitle"],
                "document_title": entry["metadata"]["document_title"],
                "created_date": entry["metadata"]["date"],
                "revise_info": entry["metadata"]["revise_info"],
                "source": entry["metadata"]["source"],
                "title_doc": entry["metadata"]["title"]["doc"],
                "title_chapter": entry["metadata"]["title"]["chapter"],
                "title_section": entry["metadata"]["title"]["section"],
                "title_subsection": entry["metadata"]["title"]["subsection"],
            },
            metadata_seperator="::",
            metadata_template="{key}=>{value}",
            text_template="Metadata: {metadata_str}\n-----\nContent: {content}",
        )
        documents.append(doc)
    
    return documents

# JSON 파일 경로
input_file = "/mnt/c/Users/Shic/legal_graph/results/1-2/DCM_1-2_nebula_test.json"

# 로더 사용하여 데이터 불러오기
documents = load_json_as_documents(input_file)


In [8]:
len(documents)

70

In [9]:
list(documents[0].metadata.keys())

['law_index',
 'name',
 'document_title',
 'created_date',
 'revise_info',
 'source',
 'title_doc',
 'title_chapter',
 'title_section',
 'title_subsection']

테스트용 데이터 범위 제1조 ~ 제49조에서 제30조 ~ 제49조로 줄임 (Documents 188 -> 70)

In [10]:
from llama_index.core.schema import MetadataMode
print(
    "The LLM sees this: \n",
    documents[1].get_content(metadata_mode=MetadataMode.LLM),
)

The LLM sees this: 
 Metadata: law_index=>제30조제2항::name=>재무건전성 유지::document_title=>자본시장과 금융투자업에 관한 법률 ( 약칭: 자본시장법 )
::created_date=>시행 2024. 8. 14.::revise_info=>법률 제20305호, 2024. 2. 13., 일부개정
::source=>국가법령정보센터::title_doc=>제2편 금융투자업::title_chapter=>제3장 건전경영 유지::title_section=>제1절 경영건전성 감독::title_subsection=>None
-----
Content: 제1항의 영업용순자본과 총위험액의 산정에 관한 구체적인 기준 및 방법은 금융위원회가 정하여 고시한다.<개정 2008. 2. 29.>


In [11]:
print(
    "The Embedding model sees this: \n",
    documents[1].get_content(metadata_mode=MetadataMode.EMBED),
)

The Embedding model sees this: 
 Metadata: law_index=>제30조제2항::name=>재무건전성 유지::document_title=>자본시장과 금융투자업에 관한 법률 ( 약칭: 자본시장법 )
::created_date=>시행 2024. 8. 14.::revise_info=>법률 제20305호, 2024. 2. 13., 일부개정
::source=>국가법령정보센터::title_doc=>제2편 금융투자업::title_chapter=>제3장 건전경영 유지::title_section=>제1절 경영건전성 감독::title_subsection=>None
-----
Content: 제1항의 영업용순자본과 총위험액의 산정에 관한 구체적인 기준 및 방법은 금융위원회가 정하여 고시한다.<개정 2008. 2. 29.>


### Step 2, Generate a KnowledgeGraphIndex with NebulaGraph as graph_store

Then, we will create a KnowledgeGraphIndex to enable Graph based RAG, see [here](https://gpt-index.readthedocs.io/en/latest/examples/index_structs/knowledge_graph/KnowledgeGraphIndex_vs_VectorStoreIndex_vs_CustomIndex_combined.html) for deails, apart from that, we have a Knowledge Graph up and running for other purposes, too!

> https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_documents/

In [12]:
# from llama_index.core import Prompt

triplet_prompt = """
다음과 같은 법률 데이터베이스가 구축되어 있습니다.

데이터베이스 생성 구문:
CREATE TAG law(law_index string, subtitle string, document_title string, created_date string, revise_info string, source string, title_doc string, title_chapter string, title_section string, title_subsection string);
CREATE EDGE REFERS_TO();
CREATE TAG INDEX law_index ON law(law_index(256));

데이터베이스에 노드와 엣지를 삽입하기 위해서, 주어진 텍스트에서 가능한 모든 사실과 법 조항 간의 관계를 추출하세요.
- 각 사실은 (주어, 관계, 객체)의 형태로 나타냅니다.
- 특히, 법 조항 간의 참조, 의무, 금지 등의 관계를 중점적으로 추출하세요.

예시:
 Metadata: law_index=>제44조제2항::subtitle=>이해상충의 관리::document_title=>자본시장과 금융투자업에 관한 법률 ( 약칭: 자본시장법 )
::created_date=>시행 2024. 8. 14.::revise_info=>법률 제20305호, 2024. 2. 13., 일부개정
::source=>국가법령정보센터::title_doc=>제2편 금융투자업::title_chapter=>제4장 영업행위 규칙::title_section=>제1절 공통 영업행위 규칙::title_subsection=>제1관 신의성실의무 등
-----
Content: 금융투자업자는 제1항에 따라 이해상충이 발생할 가능성을 파악ㆍ평가한 결과 이해상충이 발생할 가능성이 있다고 인정되는 경우에는 그 사실을 미리 해당 투자자에게 알려야 하며, 그 이해상충이 발생할 가능성을 내부통제기준이 정하는 방법 및 절차에 따라 투자자 보호에 문제가 없는 수준으로 낮춘 후 매매, 그 밖의 거래를 하여야 한다.

삼중항:
(제44조제2항, 참조한다, 제44조제1항)
(금융투자업자, 파악ㆍ평가한다, 이해상충 발생 가능성)

텍스트: {text}
삼중항:
"""

In [None]:
!pip install llama-index-prompts llama-index-llms

In [13]:
from llama_index.core import PromptTemplate  # 수정된 부분

triplet_template = PromptTemplate(template=triplet_prompt)  # 수정된 부분

In [15]:
from llama_index.core import KnowledgeGraphIndex

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    # kg_triplet_extract_template 으로 triplet 추출 프롬프트 설정 가능
    kg_triplet_extract_template = triplet_template,
    storage_context=storage_context,
    max_triplets_per_chunk=5, # 범위 수정
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    # include_embeddings=True,
    show_progress=True
)

Parsing nodes:   0%|          | 0/70 [00:00<?, ?it/s]

Processing nodes:   0%|          | 0/78 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

Now we have a Knowledge Graph on NebulaGraph cluster under space named `llamaindex` about the 'Guardians of the Galaxy Vol. 3' movie, let's play with it a little bit.

In [None]:
# install related packages, password is nebula by default
%pip install ipython-ngql networkx pyvis jupyter_nebulagraph


In [16]:
%load_ext ngql
%ngql --address 192.168.176.1 --port 9669 --user root --password 001101

[1;3;38;2;0;135;107m[OK] Connection Pool Created[0m
INFO:nebula3.logger:Get connection to ('192.168.176.1', 9669)


Unnamed: 0,Name
0,llamaindex


In [24]:
# Query some random Relationships with Cypher
%ngql USE llamaindex;
%ngql MATCH ()-[e]->() RETURN e LIMIT 20;

INFO:nebula3.logger:Get connection to ('192.168.176.1', 9669)
INFO:nebula3.logger:Get connection to ('192.168.176.1', 9669)


Unnamed: 0,e
0,"(""거액의 금융사고"")-[:REFERS_TO@7638308093824355506{relation: ""영향을 미친다""}]->(""금융투자업자의 경영상황"")"
1,"(""겸영업무 보고내용"")-[:REFERS_TO@-7640577887932975066{relation: ""초래한다""}]->(""투자자 보호에 지장"")"
2,"(""겸영업무 보고내용"")-[:REFERS_TO@5387447905849479152{relation: ""저해한다""}]->(""금융시장의 안정성"")"
3,"(""겸영업무 보고내용"")-[:REFERS_TO@5387447905849479152{relation: ""저해한다""}]->(""금융투자업자의 경영건전성"")"
4,"(""경영건전성기준"")-[:REFERS_TO@-2645493958118358966{relation: ""포함한다""}]->(""대통령령으로 정하는 사항"")"
5,"(""경영건전성기준"")-[:REFERS_TO@-2645493958118358966{relation: ""포함한다""}]->(""유동성"")"
6,"(""경영건전성기준"")-[:REFERS_TO@-2645493958118358966{relation: ""포함한다""}]->(""자기자본비율"")"
7,"(""경영건전성기준"")-[:REFERS_TO@-2645493958118358966{relation: ""포함한다""}]->(""자산의 건전성"")"
8,"(""공시서류"")-[:REFERS_TO@-6904075763027840850{relation: ""발췌한다""}]->(""업무보고서 중 중요사항"")"
9,"(""공시서류"")-[:REFERS_TO@7795243296179001118{relation: ""이용한다""}]->(""인터넷 홈페이지"")"


In [25]:
%ngql MATCH ()-[e]->() RETURN DISTINCT e;

INFO:nebula3.logger:Get connection to ('192.168.176.1', 9669)


Unnamed: 0,e
0,"(""해외현지법인"")-[:REFERS_TO@-3648029259485943483{relation: ""대상이다""}]->(""신용공여"")"
1,"(""투자자문업자가 아닌 자"")-[:REFERS_TO@4253434467682292654{relation: ""사용하여서는 아니 된다""}]->(""이와 같은 의미를 가지는 외국어문자"")"
2,"(""투자자문업자가 아닌 자"")-[:REFERS_TO@4253434467682292654{relation: ""사용하여서는 아니 된다""}]->(""투자자문 이라는 문자"")"
3,"(""집합투자업자"")-[:REFERS_TO@8341684571847756684{relation: ""사용할 수 있다""}]->(""신탁 이라는 문자"")"
4,"(""제45조제3항"")-[:REFERS_TO@-7330902890750983237{relation: ""참조한다""}]->(""제45조제1항"")"
...,...
324,"(""자금이체업무"")-[:REFERS_TO@-5524706812882331863{relation: ""수행한다""}]->(""투자자예탁금"")"
325,"(""제42조제2항"")-[:REFERS_TO@-7330902890750983237{relation: ""참조한다""}]->(""제42조제1항"")"
326,"(""제45조제4항"")-[:REFERS_TO@-7330902890750983237{relation: ""참조한다""}]->(""제45조제1항"")"
327,"(""제45조제4항"")-[:REFERS_TO@-7330902890750983237{relation: ""참조한다""}]->(""제45조제2항"")"


In [29]:
%ngql MATCH ()-[e]->() RETURN DISTINCT e.relation;


INFO:nebula3.logger:Get connection to ('192.168.176.1', 9669)


Unnamed: 0,e.relation
0,대상이다
1,사용하여서는 아니 된다
2,참조한다
3,사용할 수 있다
4,수행한다
5,정한다
6,관련된다
7,따른다
8,영향을 미친다
9,제출하여야 한다


In [18]:
%ngql help

[1;3;38;2;0;135;107m

        Supported Configurations:
        ------------------------
        
        > How to config ngql_result_style in "raw", "pandas"
        %config IPythonNGQL.ngql_result_style="raw"
        %config IPythonNGQL.ngql_result_style="pandas"

        > How to config ngql_verbose in True, False
        %config IPythonNGQL.ngql_verbose=True

        > How to config max_connection_pool_size
        %config IPythonNGQL.max_connection_pool_size=10

        Quick Start:
        -----------

        > Connect to Neubla Graph
        %ngql --address 127.0.0.1 --port 9669 --user user --password password

        > Use Space
        %ngql USE basketballplayer

        > Query
        %ngql SHOW TAGS;

        > Multile Queries
        %%ngql
        SHOW TAGS;
        SHOW HOSTS;

        Reload ngql Magic
        %reload_ext ngql

        > Variables in query, we are using Jinja2 here
        name = "nba"
        %ngql USE "{{ name }}"

        > Query and draw the grap

In [21]:
%ngql USE llamaindex
%ng_draw

INFO:nebula3.logger:Get connection to ('192.168.176.1', 9669)


<class 'pyvis.network.Network'> |N|=25 |E|=20

## Asking the Knowledge Graph

Finally, let's demo how to Query Knowledge Graph with Natural language!

Here, we will leverage the `KnowledgeGraphQueryEngine`, with `NebulaGraphStore` as the `storage_context.graph_store`.

In [11]:
from llama_index.llms.openai import OpenAI
llm = OpenAI(temperature=0, model="gpt-4o")

In [None]:
from llama_index.core.query_engine import KnowledgeGraphQueryEngine

from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore

query_engine = KnowledgeGraphQueryEngine(
    storage_context=storage_context,
    llm=llm,
    verbose=True,
)

In [None]:
response = query_engine.generate_query(
    "증권신고서와 투자설명서의 부실기재에 대한 과징금이 부과되는 경우 부과대상과 부과금액은?",
)
display(Markdown(f"<b>{response}</b>"))

In [None]:
graph_query = query_engine.generate_query(
    "Tell me about Peter Quill?",
)

graph_query = graph_query.replace("WHERE", "\n  WHERE").replace(
    "RETURN", "\nRETURN"
)

display(
    Markdown(
        f"""
```cypher
{graph_query}
```
"""
    )
)

We could see it helps generate the Graph query:

```cypher
MATCH (p:`entity`)-[:relationship]->(e:`entity`)
  WHERE p.`entity`.`name` == 'Peter Quill'
RETURN e.`entity`.`name`;
```
And synthese the question based on its result:

```json
{'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']}
```

Of course we still could query it, too! And this query engine could be our best Graph Query Language learning bot, then :).

In [None]:
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p.`entity`.`name`, e.relationship, m.`entity`.`name`;

And change the query to be rendered

In [None]:
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p, e, m;

In [None]:
%ng_draw

The results of this knowledge-fetching query could not be more clear from the renderred graph then.