# 如何按字符递归拆分文本

:::info 预备知识

本指南假定您已熟悉以下概念：

- [文本拆分器](/docs/concepts/text_splitters)

:::

此文本拆分器推荐用于通用文本。它通过一个字符列表进行参数化。它会按顺序尝试在这些字符处分割，直到块足够小。默认列表为 `["\n\n", "\n", " ", ""]`。这样做的效果是，尽可能保持段落（然后是句子，然后是单词）整体，因为这些通常在语义上是关联性最强的文本部分。

1. 文本的拆分方式：通过字符列表进行拆分。
2. 块大小的度量方式：通过字符数量进行度量。

下面我们展示示例用法。

如需直接获取字符串内容，请使用 `.splitText`。

如需创建 LangChain [Document](https://api.js.langchain.com/classes/langchain_core.documents.Document.html) 对象（例如用于下游任务），请使用 `.createDocuments`。

In [5]:
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";

const text = `Hi.\n\nI'm Harrison.\n\nHow? Are? You?\nOkay then f f f f.
This is a weird text to write, but gotta test the splittingggg some how.\n\n
Bye!\n\n-H.`;
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 10,
  chunkOverlap: 1,
});

const output = await splitter.createDocuments([text]);

console.log(output.slice(0, 3));

[
  Document {
    pageContent: "Hi.",
    metadata: { loc: { lines: { from: 1, to: 1 } } }
  },
  Document {
    pageContent: "I'm",
    metadata: { loc: { lines: { from: 3, to: 3 } } }
  },
  Document {
    pageContent: "Harrison.",
    metadata: { loc: { lines: { from: 3, to: 3 } } }
  }
]


请注意，在上面的例子中，我们正在拆分一个原始文本字符串，并获得一个文档列表。我们也可以直接拆分文档。

In [6]:
import { Document } from "@langchain/core/documents";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";

const text = `Hi.\n\nI'm Harrison.\n\nHow? Are? You?\nOkay then f f f f.
This is a weird text to write, but gotta test the splittingggg some how.\n\n
Bye!\n\n-H.`;
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 10,
  chunkOverlap: 1,
});

const docOutput = await splitter.splitDocuments([
  new Document({ pageContent: text }),
]);

console.log(docOutput.slice(0, 3));

[
  Document {
    pageContent: "Hi.",
    metadata: { loc: { lines: { from: 1, to: 1 } } }
  },
  Document {
    pageContent: "I'm",
    metadata: { loc: { lines: { from: 3, to: 3 } } }
  },
  Document {
    pageContent: "Harrison.",
    metadata: { loc: { lines: { from: 3, to: 3 } } }
  }
]


你可以通过传递一个 `separators` 参数自定义 `RecursiveCharacterTextSplitter`，例如：

In [7]:
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { Document } from "@langchain/core/documents";

const text = `Some other considerations include:

- Do you deploy your backend and frontend together, or separately?
- Do you deploy your backend co-located with your database, or separately?

**Production Support:** As you move your LangChains into production, we'd love to offer more hands-on support.
Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to share more about what you're building, and our team will get in touch.

## Deployment Options

See below for a list of deployment options for your LangChain app. If you don't see your preferred option, please get in touch and we can add it to this list.`;

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 50,
  chunkOverlap: 1,
  separators: ["|", "##", ">", "-"],
});

const docOutput = await splitter.splitDocuments([
  new Document({ pageContent: text }),
]);

console.log(docOutput.slice(0, 3));

[
  Document {
    pageContent: "Some other considerations include:",
    metadata: { loc: { lines: { from: 1, to: 1 } } }
  },
  Document {
    pageContent: "- Do you deploy your backend and frontend together",
    metadata: { loc: { lines: { from: 3, to: 3 } } }
  },
  Document {
    pageContent: "r, or separately?",
    metadata: { loc: { lines: { from: 3, to: 3 } } }
  }
]


## 下一步
你现在已经了解了一种按字符分割文本的方法。

接下来，查看[关于代码分割的具体技巧](/docs/how_to/code_splitter)或[检索增强生成的完整教程](/docs/tutorials/rag)。