# 如何按字符分割

:::info 前置条件

本指南假定您已熟悉以下概念：

- [文本分割器](/docs/concepts/text_splitters)

:::

这是分割文本的最简单方法。它基于给定的字符序列进行分割，默认值为 `"\n\n"`。块长度通过字符数量来衡量。

1. 文本如何分割：按单个字符分隔符分割。
2. 块大小如何衡量：通过字符数量衡量。

若要直接获取字符串内容，请使用 `.splitText()`。

若要创建 LangChain [Document](https://api.js.langchain.com/classes/langchain_core.documents.Document.html) 对象（例如，用于下游任务），请使用 `.createDocuments()`。

In [1]:
import { CharacterTextSplitter } from "@langchain/textsplitters";
import * as fs from "node:fs";

// Load an example document
const rawData = await fs.readFileSync("../../../../examples/state_of_the_union.txt");
const stateOfTheUnion = rawData.toString();

const textSplitter = new CharacterTextSplitter({
    separator: "\n\n",
    chunkSize: 1000,
    chunkOverlap: 200,
});
const texts = await textSplitter.createDocuments([stateOfTheUnion]);
console.log(texts[0])

Document {
  pageContent: "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and th"... 839 more characters,
  metadata: { loc: { lines: { from: 1, to: 17 } } }
}


您还可以将与每个文档关联的元数据传播到输出块中：

In [2]:
const metadatas = [{ document: 1 }, { document: 2 }];

const documents = await textSplitter.createDocuments(
    [stateOfTheUnion, stateOfTheUnion], metadatas
)

console.log(documents[0])

Document {
  pageContent: "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and th"... 839 more characters,
  metadata: { document: 1, loc: { lines: { from: 1, to: 17 } } }
}


要直接获取字符串内容，请使用 `.splitText()`：

In [3]:
const chunks = await textSplitter.splitText(stateOfTheUnion);

chunks[0];

[32m"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and th"[39m... 839 more characters

## 下一步

你现在已经了解了一种按字符分割文本的方法。

接下来，查看一种[更高级的按字符分割方法](/docs/how_to/recursive_text_splitter)，或者完整的[检索增强生成教程](/docs/tutorials/rag)。