# 如何构建知识图谱

在本指南中，我们将介绍基于非结构化文本构建知识图谱的基本方法。构建完成的图谱可作为知识库用于RAG应用。从高层次来看，从文本构建知识图谱的步骤包括：

1. 从文本中提取结构化信息：使用模型从文本中提取结构化的图信息。
2. 存储到图数据库：将提取出的结构化图信息存储到图数据库中，以支持下游的RAG应用

## 配置
#### 安装依赖

```{=mdx}
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";
import Npm2Yarn from "@theme/Npm2Yarn";

<IntegrationInstallTooltip></IntegrationInstallTooltip>

<Npm2Yarn>
  langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod
</Npm2Yarn>
```

#### 设置环境变量

本示例中我们将使用 OpenAI：

```env
OPENAI_API_KEY=your-api-key

# 可选，使用 LangSmith 以获得最佳的可观测性
LANGSMITH_API_KEY=your-api-key
LANGSMITH_TRACING=true

# 如果您不在无服务器环境中，请减少追踪延迟
# LANGCHAIN_CALLBACKS_BACKGROUND=true
```

接下来，我们需要定义 Neo4j 凭据。
请按照 [这些安装步骤](https://neo4j.com/docs/operations-manual/current/installation/) 来设置 Neo4j 数据库。

```env
NEO4J_URI="bolt://localhost:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="password"
```

以下示例将创建与 Neo4j 数据库的连接。

In [None]:
import "neo4j-driver";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";

const url = process.env.NEO4J_URI;
const username = process.env.NEO4J_USER;
const password = process.env.NEO4J_PASSWORD;
const graph = await Neo4jGraph.initialize({ url, username, password });

## LLM 图转换器
从文本中提取图数据能够将非结构化信息转换为结构化格式，便于深入分析并更高效地处理复杂的关系和模式。LLMGraphTransformer 利用大语言模型（LLM）解析和分类实体及其关系，将文本文档转换为结构化图文档。LLM 模型的选择会显著影响输出结果，决定所提取图数据的准确性与细致程度。

In [2]:
import { ChatOpenAI } from "@langchain/openai";
import { LLMGraphTransformer } from "@langchain/community/experimental/graph_transformers/llm";

const model = new ChatOpenAI({
    temperature: 0,
    model: "gpt-4o-mini",
});

const llmGraphTransformer = new LLMGraphTransformer({
    llm: model
});


现在我们可以输入示例文本并检查结果。

In [3]:
import { Document } from "@langchain/core/documents";

let text = `
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
`

const result = await llmGraphTransformer.convertToGraphDocuments([
    new Document({ pageContent: text }),
]);

console.log(`Nodes: ${result[0].nodes.length}`);
console.log(`Relationships:${result[0].relationships.length}`);

Nodes: 8
Relationships:7


请注意，由于我们使用了大语言模型（LLM），图构建过程具有非确定性。因此，每次执行可能会得到略微不同的结果。
请查看以下图片，以更好地理解生成的知识图谱结构。

![graph_construction1.png](../../static/img/graph_construction1.png)

此外，您还可以根据自己的需求灵活定义要提取的特定类型的节点和关系。

In [4]:
const llmGraphTransformerFiltered = new LLMGraphTransformer({
    llm: model,
    allowedNodes: ["PERSON", "COUNTRY", "ORGANIZATION"],
    allowedRelationships:["NATIONALITY", "LOCATED_IN", "WORKED_AT", "SPOUSE"],
    strictMode:false
});

const result_filtered = await llmGraphTransformerFiltered.convertToGraphDocuments([
    new Document({ pageContent: text }),
]);

console.log(`Nodes: ${result_filtered[0].nodes.length}`);
console.log(`Relationships:${result_filtered[0].relationships.length}`);

Nodes: 6
Relationships:4


为了更好地理解生成的图，我们可以再次将其可视化。

![graph_construction1.png](../../static/img/graph_construction2.png)

## 存储到图数据库
生成的图文档可以使用 `addGraphDocuments` 方法存储到图数据库中。

In [5]:
await graph.addGraphDocuments(result_filtered)