# Llama Pack - Neo4j 查询引擎
本 Llama Pack 创建了一个 Neo4j 知识图谱查询引擎，并执行其 `query` 函数。此包提供了为 Neo4j 知识图谱创建多种类型的查询引擎的选项，包括：

* 知识图谱基于向量的实体检索（如果没有提供查询引擎类型选项，则为默认选项）
* 知识图谱基于关键词的实体检索
* 知识图谱混合实体检索
* 原始向量索引检索
* 自定义组合查询引擎（向量相似性 + 知识图谱实体检索）
* KnowledgeGraphQueryEngine
* KnowledgeGraphRAGRetriever

对于这个笔记本，我们将加载一个关于古食饮食的维基百科页面到 Neo4j 知识图谱中，并执行查询。

## 安装了必要的库

- `llama_index`：将大语言模型和外部数据连接在一起的工具
- `neo4j`：用于存储采集的数据，支持实体和属性等关系查询
- `llama_hub`：一个社区驱动的预打包模块中心，可以使用它来启动 LLM 应用程序


In [1]:
%pip install llama_index llama_hub neo4j

Note: you may need to restart the kernel to use updated packages.


### 调整输出日志配置

In [2]:
import os, logging, sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)

## 加载数据

加载维基百科中关于古法饮食的页面。

In [3]:
from llama_index import download_loader

WikipediaReader = download_loader("WikipediaReader")
loader = WikipediaReader()
documents = loader.load_data(pages=['Paleolithic diet'], auto_suggest=False)
print(f'Loaded {len(documents)} documents')

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): llamahub.ai:443
DEBUG:urllib3.connectionpool:https://llamahub.ai:443 "POST /api/analytics/downloads HTTP/1.1" 200 63
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1:7890
DEBUG:urllib3.connectionpool:http://127.0.0.1:7890 "GET http://en.wikipedia.org/w/api.php?prop=info%7Cpageprops&inprop=url&ppprop=disambiguation&redirects=&titles=Paleolithic+diet&format=json&action=query HTTP/1.1" 301 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): en.wikipedia.org:443
DEBUG:urllib3.connectionpool:https://en.wikipedia.org:443 "GET /w/api.php?prop=info%7Cpageprops&inprop=url&ppprop=disambiguation&redirects=&titles=Paleolithic+diet&format=json&action=query HTTP/1.1" 200 477
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1:7890
DEBUG:urllib3.connectionpool:http://127.0.0.1:7890 "GET http://en.wikipedia.org/w/api.php?prop=extracts%7Crevisions&explaintext=&rvprop=ids&titles=Pal

## 下载并初始化软件包

### 使用 OpenAI 相关配置

In [4]:
# from llama_index.llama_pack import download_llama_pack

# os.environ['OPENAI_API_KEY'] = "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"

# # download and install dependencies
# Neo4jQueryEnginePack = download_llama_pack(
#   "Neo4jQueryEnginePack", "./neo4j_pack"
# )

### 使用 AuzreOpenAI 相关配置（当前项目的默认使用配置）

In [5]:



from neo4j_pack.base import Neo4jQueryEnginePack
# neo4j_pack.base 配置对应的 Auzre 账号



## Neo4j 安装与配置

#### 安装 neo4j 服务
```shell
docker run -d -p 7474:7474 -p 7687:7687 --name neo4j-apoc -e NEO4J_apoc_export_file_enabled=true -e NEO4J_apoc_import_file_enabled=true -e NEO4J_apoc_import_file_use__neo4j__config=true -e NEO4J_AUTH=neo4j/pleaseletmein -e NEO4J_PLUGINS=\[\"apoc\"\] neo4j:latest
```
- username：neo4j
- password：pleaseletmein
- url：bolt://127.0.0.1:7687
- database：username：neo4j

### 配置 Neo4j 信息
Neo4j 的证书存储在项目根目录下的 `credentials.json` 中，加载 json 并提取证书详细信息。
请开 [credentials.json](./credentials.json) ，配置安装的 Neo4j 服务时的指定的账号信息


In [6]:
import json

# get Neo4j credentials (assume it's stored in credentials.json)
with open('credentials.json') as f:
  neo4j_connection_params = json.load(f)
  username = neo4j_connection_params['username']
  password = neo4j_connection_params['password']
  url = neo4j_connection_params['url']
  database = neo4j_connection_params['database']

以下是 `Neo4jQueryEnginePack`的构造方式。你可以从 `Neo4jQueryEngineType` 传入 `query_engine_type` 来构建 `Neo4jQueryEnginePack`。下面的代码片段展示了一个知识图谱（KG）关键词查询引擎。如果没有定义 `query_engine_type`，它默认为基于KG向量的实体检索。

`Neo4jQueryEngineType` 是一个枚举类型，包含多种查询引擎类型，如下所示。你可以传入其中任何一个查询引擎类型来构建 `Neo4jQueryEnginePack`。


```
class Neo4jQueryEngineType(str, Enum):
    """Neo4j query engine type"""

    KG_KEYWORD = "keyword"
    KG_HYBRID = "hybrid"
    RAW_VECTOR = "vector"
    RAW_VECTOR_KG_COMBO = "vector_kg"
    KG_QE = "KnowledgeGraphQueryEngine"
    KG_RAG_RETRIEVER = "KnowledgeGraphRAGRetriever"
```

In [7]:
from llama_hub.llama_packs.neo4j_query_engine.base import Neo4jQueryEngineType


# create the pack
neo4j_pack = Neo4jQueryEnginePack(
  username = username,
  password = password,
  url = url,
  database = database,
  docs = documents,
  query_engine_type = Neo4jQueryEngineType.KG_KEYWORD,
)

DEBUG:neo4j:[#0000]  _: <POOL> created, direct address IPv4Address(('127.0.0.1', 7687))
DEBUG:neo4j:[#0000]  _: <POOL> acquire direct connection, access_mode='READ', database=None


DEBUG:neo4j:[#0000]  _: <POOL> trying to hand out new connection
DEBUG:neo4j:[#0000]  _: <RESOLVE> in: 127.0.0.1:7687
DEBUG:neo4j:[#0000]  _: <RESOLVE> dns resolver out: 127.0.0.1:7687
DEBUG:neo4j:[#0000]  C: <OPEN> 127.0.0.1:7687
DEBUG:neo4j:[#D907]  C: <MAGIC> 0x6060B017
DEBUG:neo4j:[#D907]  C: <HANDSHAKE> 0x00040405 0x00020404 0x00000104 0x00000003
DEBUG:neo4j:[#D907]  S: <HANDSHAKE> 0x00000405
DEBUG:neo4j:[#D907]  C: HELLO {'user_agent': 'neo4j-python/5.15.0 Python/3.11.6-final-0 (darwin)', 'bolt_agent': {'product': 'neo4j-python/5.15.0', 'platform': 'Darwin 23.2.0; arm64', 'language': 'Python/3.11.6-final-0', 'language_details': 'CPython; 3.11.6-final-0 (main, Oct  3 2023 10:37:07) [Clang 15.0.7 ]'}}
DEBUG:neo4j:[#D907]  _: <CONNECTION> client state: CONNECTED > AUTHENTICATION
DEBUG:neo4j:[#D907]  C: LOGON {'scheme': 'basic', 'principal': 'neo4j', 'credentials': '*******'}
DEBUG:neo4j:[#D907]  _: <CONNECTION> client state: AUTHENTICATION > READY
DEBUG:neo4j:[#D907]  S: SUCCESS {'s

## Run Pack

In [8]:
from IPython.display import Markdown

response = neo4j_pack.run("Tell me about the benefits of paleo diet.")
display(Markdown(f"<b>{response}</b>"))

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/deployments/gpt-35-turbo-16k/chat/completions', 'files': None, 'json_data': {'messages': [{'role': <MessageRole.USER: 'user'>, 'content': "A question is provided below. Given the question, extract up to 10 keywords from the text. Focus on extracting the keywords that we can use to best lookup answers to the question. Avoid stopwords.\n---------------------\nTell me about the benefits of paleo diet.\n---------------------\nProvide keywords in the following comma-separated format: 'KEYWORDS: <keywords>'\n"}], 'model': 'gpt-35-turbo-16k', 'stream': False, 'temperature': 0.1}, 'headers': {'api-key': '5f98852154cf442fbefbb5f8be916294'}}
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'POST']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_res

<b>The benefits of the paleo diet include a re-imagining of what Paleolithic people ate, a diet that is 65% plant-based, and the forbidding of consumption of all dairy products. Additionally, the paleo diet has always been varied, which can contribute to its overall health benefits.</b>

我们尝试使用KG混合查询引擎。请参见下面的代码。你可以通过替换 `Neo4jQueryEngineType` 枚举中的其他查询引擎类型来以类似的方式尝试任何其他查询引擎。

In [9]:
neo4j_pack = Neo4jQueryEnginePack(
  username = username,
  password = password,
  url = url,
  database = database,
  docs = documents,
  query_engine_type = Neo4jQueryEngineType.KG_HYBRID
)

response = neo4j_pack.run("Tell me about the benefits of paleo diet.")
display(Markdown(f"<b>{response}</b>"))

DEBUG:neo4j:[#0000]  _: <POOL> created, direct address IPv4Address(('127.0.0.1', 7687))
DEBUG:neo4j:[#0000]  _: <POOL> acquire direct connection, access_mode='READ', database=None
DEBUG:neo4j:[#0000]  _: <POOL> trying to hand out new connection
DEBUG:neo4j:[#0000]  _: <RESOLVE> in: 127.0.0.1:7687
DEBUG:neo4j:[#0000]  _: <RESOLVE> dns resolver out: 127.0.0.1:7687
DEBUG:neo4j:[#0000]  C: <OPEN> 127.0.0.1:7687
DEBUG:neo4j:[#DA8F]  C: <MAGIC> 0x6060B017
DEBUG:neo4j:[#DA8F]  C: <HANDSHAKE> 0x00040405 0x00020404 0x00000104 0x00000003
DEBUG:neo4j:[#DA8F]  S: <HANDSHAKE> 0x00000405
DEBUG:neo4j:[#DA8F]  C: HELLO {'user_agent': 'neo4j-python/5.15.0 Python/3.11.6-final-0 (darwin)', 'bolt_agent': {'product': 'neo4j-python/5.15.0', 'platform': 'Darwin 23.2.0; arm64', 'language': 'Python/3.11.6-final-0', 'language_details': 'CPython; 3.11.6-final-0 (main, Oct  3 2023 10:37:07) [Clang 15.0.7 ]'}}
DEBUG:neo4j:[#DA8F]  _: <CONNECTION> client state: CONNECTED > AUTHENTICATION
DEBUG:neo4j:[#DA8F]  C: LOG

<b>The paleo diet is believed to have several benefits. Advocates of the diet argue that it is nutritionally closer to the diet of our Paleolithic ancestors, which they believe is the best fit for our genetic makeup. They claim that following a paleo diet can help prevent or reduce the risk of chronic diseases and degenerative conditions that are common in modern Western populations. The diet emphasizes the consumption of whole, unprocessed foods such as vegetables, fruits, nuts, seeds, lean meats, and fish, while avoiding processed foods, grains, legumes, and dairy products. Some studies suggest that the paleo diet may help with weight loss, as it promotes the consumption of satiating foods and reduces overall caloric intake. However, it is important to note that the health benefits of the paleo diet are still a topic of debate, and more research is needed to fully understand its long-term effects on health.</b>

## 知识图谱查询策略比较

下表列出了7种查询引擎的详细信息，以及基于与NebulaGraph和LlamaIndex的实验得出的它们的优缺点，这在博客文章 [利用 LlamaIndex 浏览知识图谱的 7 种查询策略](https://www.toutiao.com/article/7317169514767008271/).

![知识图谱查询策略比较](https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/ffa837ee967d46f9baaf28784b7ec0c0~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1704356130&x-signature=GXA885tOjv39mCKu1oftcWCcWtA%3D)