# Writer文本分割器

本笔记本提供了开始使用Writer的[文本分割器](/docs/concepts/text_splitters/)的快速概述。

Writer的[上下文感知分割端点](https://dev.writer.com/api-guides/tools#context-aware-text-splitting)为长文档（最多4000个单词）提供智能文本分割功能。与简单的基于字符的分割不同，它在块之间保留语义和上下文，使其成为处理长篇内容同时保持连贯性的理想选择。在`langchain-writer`中，我们提供了Writer的上下文感知分割端点作为LangChain文本分割器的使用。

## 概述

### 集成详情
| 类                                                                                                                                    | 包          | 本地 | 可序列化 | JS支持 |                                        包下载量                                         |                                        包最新版本                                         |
|:-----------------------------------------------------------------------------------------------------------------------------------------|:-----------------| :---: | :---: |:----------:|:------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------:|
| [WriterTextSplitter](https://github.com/writer/langchain-writer/blob/main/langchain_writer/text_splitter.py#L11) | [langchain-writer](https://pypi.org/project/langchain-writer/) |      ❌       |                                       ❌                                       | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-writer?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-writer?style=flat-square&label=%20) |

## 设置

`WriterTextSplitter`可在`langchain-writer`包中获得：

In [None]:
%pip install --quiet -U langchain-writer

### 凭据

注册[Writer AI Studio](https://app.writer.com/aistudio/signup?utm_campaign=devrel)以生成API密钥（您可以按照此[快速入门](https://dev.writer.com/api-guides/quickstart)进行操作）。然后，设置WRITER_API_KEY环境变量：

In [None]:
import getpass
import os

if not os.getenv("WRITER_API_KEY"):
    os.environ["WRITER_API_KEY"] = getpass.getpass("Enter your Writer API key: ")

设置[LangSmith](https://smith.langchain.com/)对于获得一流的可观察性也很有帮助（但不是必需的）。如果您希望这样做，可以设置`LANGSMITH_TRACING`和`LANGSMITH_API_KEY`环境变量：

In [None]:
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

### 实例化

实例化`WriterTextSplitter`的实例，将`strategy`参数设置为以下之一：

- `llm_split`：使用语言模型进行精确的语义分割
- `fast_split`：使用基于启发式的方法进行快速分割
- `hybrid_split`：结合两种方法


In [None]:
from langchain_writer.text_splitter import WriterTextSplitter

splitter = WriterTextSplitter(strategy="fast_split")

## 使用方法
`WriterTextSplitter`可以同步或异步使用。

### 同步使用
要同步使用`WriterTextSplitter`，请使用您想要分割的文本调用`split_text`方法：

In [None]:
text = """Reeeeeeeeeeeeeeeeeeeeeaally long text you want to divide into smaller chunks. For example you can add a poem multiple times:
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
"""

chunks = splitter.split_text(text)
chunks

您可以打印块的长度来查看创建了多少个块：

In [None]:
print(len(chunks))

### 异步使用
要异步使用`WriterTextSplitter`，请使用您想要分割的文本调用`asplit_text`方法：

In [None]:
async_chunks = await splitter.asplit_text(text)
async_chunks

打印块的长度来查看创建了多少个块：

In [None]:
print(len(async_chunks))

## API参考
有关所有`WriterTextSplitter`功能和配置的详细文档，请访问[API参考](https://python.langchain.com/api_reference/writer/text_splitter/langchain_writer.text_splitter.WriterTextSplitter.html#langchain_writer.text_splitter.WriterTextSplitter)。

## 其他资源
您可以在[Writer文档](https://dev.writer.com/home)中找到有关Writer模型（包括成本、上下文窗口和支持的输入类型）和工具的信息。