# Azure Cosmos DB for Apache Gremlin

>[Azure Cosmos DB for Apache Gremlin](https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/introduction) 是一种图数据库服务，可用于存储拥有数十亿顶点和边的海量图。您可以用毫秒级的延迟查询图，并轻松地演进图结构。
>
>[Gremlin](https://en.wikipedia.org/wiki/Gremlin_(query_language)) 是由 `Apache Software Foundation` 的 `Apache TinkerPop` 开发的一种图遍历语言和虚拟机。

本笔记本展示了如何使用 LLM 为图数据库提供自然语言接口，该数据库可以使用 `Gremlin` 查询语言进行查询。

## 设置

安装一个库：

In [None]:
!pip3 install gremlinpython

您需要一个 Azure CosmosDB Graph 数据库实例。一种选择是在 Azure 中创建一个[免费的 CosmosDB Graph 数据库实例](https://learn.microsoft.com/en-us/azure/cosmos-db/free-tier)。

创建 Cosmos DB 帐户和 Graph 时，请使用 `/type` 作为分区键。

In [None]:
cosmosdb_name = "mycosmosdb"
cosmosdb_db_id = "graphtesting"
cosmosdb_db_graph_id = "mygraph"
cosmosdb_access_Key = "longstring=="

In [None]:
import nest_asyncio
from langchain_community.chains.graph_qa.gremlin import GremlinQAChain
from langchain_community.graphs import GremlinGraph
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_openai import AzureChatOpenAI

In [None]:
graph = GremlinGraph(
    url=f"wss://{cosmosdb_name}.gremlin.cosmos.azure.com:443/",
    username=f"/dbs/{cosmosdb_db_id}/colls/{cosmosdb_db_graph_id}",
    password=cosmosdb_access_Key,
)

## 填充数据库

假设您的数据库为空，您可以使用 GraphDocuments 来填充它。

对于 Gremlin，请始终为每个节点添加名为 'label' 的属性。
如果没有设置 label，则使用 Node.type 作为 label。
对于 cosmos，使用 natural id's 是有意义的，因为它们在 graph explorer 中是可见的。

In [None]:
source_doc = Document(
    page_content="Matrix is a movie where Keanu Reeves, Laurence Fishburne and Carrie-Anne Moss acted."
)
movie = Node(id="The Matrix", properties={"label": "movie", "title": "The Matrix"})
actor1 = Node(id="Keanu Reeves", properties={"label": "actor", "name": "Keanu Reeves"})
actor2 = Node(
    id="Laurence Fishburne", properties={"label": "actor", "name": "Laurence Fishburne"}
)
actor3 = Node(
    id="Carrie-Anne Moss", properties={"label": "actor", "name": "Carrie-Anne Moss"}
)
rel1 = Relationship(
    id=5, type="ActedIn", source=actor1, target=movie, properties={"label": "ActedIn"}
)
rel2 = Relationship(
    id=6, type="ActedIn", source=actor2, target=movie, properties={"label": "ActedIn"}
)
rel3 = Relationship(
    id=7, type="ActedIn", source=actor3, target=movie, properties={"label": "ActedIn"}
)
rel4 = Relationship(
    id=8,
    type="Starring",
    source=movie,
    target=actor1,
    properties={"label": "Strarring"},
)
rel5 = Relationship(
    id=9,
    type="Starring",
    source=movie,
    target=actor2,
    properties={"label": "Strarring"},
)
rel6 = Relationship(
    id=10,
    type="Straring",
    source=movie,
    target=actor3,
    properties={"label": "Strarring"},
)
graph_doc = GraphDocument(
    nodes=[movie, actor1, actor2, actor3],
    relationships=[rel1, rel2, rel3, rel4, rel5, rel6],
    source=source_doc,
)

In [None]:
# The underlying python-gremlin has a problem when running in notebook
# The following line is a workaround to fix the problem
nest_asyncio.apply()

# Add the document to the CosmosDB graph.
graph.add_graph_documents([graph_doc])

## 刷新图谱架构信息
如果数据库的架构发生变化（更新后），您可以刷新架构信息。

In [None]:
graph.refresh_schema()

In [None]:
print(graph.schema)

## 查询图

我们现在可以使用 gremlin QA 链来对图提出问题

In [None]:
chain = GremlinQAChain.from_llm(
    AzureChatOpenAI(
        temperature=0,
        azure_deployment="gpt-4-turbo",
    ),
    graph=graph,
    verbose=True,
)

In [None]:
chain.invoke("Who played in The Matrix?")

In [None]:
chain.run("How many people played in The Matrix?")