# 文字ごとに分割する方法

これは最も単純な方法です。  
この方法では、指定された文字列（デフォルトでは "\n\n"）に基づいてテキストを分割します。チャンクの長さは文字数で測定されます。

特徴
1. テキストの分割方法: 単一の文字列セパレーターを使用して分割。
2. チャンクサイズの測定方法: 文字数に基づいて測定。

出力方法
- 文字列を直接取得する場合: `.split_text` を使用します。
- LangChainのDocumentオブジェクトを作成する場合（例: 後続のタスクで使用するため）: `.create_documents` を使用します。

In [None]:
%pip install -qU langchain-text-splitters

In [1]:
from langchain_text_splitters import CharacterTextSplitter

# サンプルドキュメントを読み込む
with open("state_of_the_union.txt") as f:
    state_of_the_union = f.read()

# CharacterTextSplitterを初期化
text_splitter = CharacterTextSplitter(
    separator="\n\n",        # セパレーター（デフォルトは"\n\n"）
    chunk_size=1000,         # チャンクの最大サイズ
    chunk_overlap=200,       # チャンク間の重複部分のサイズ
    length_function=len,     # チャンクサイズを測定する関数
    is_separator_regex=False # セパレーターを正規表現として解釈しない
)

# LangChainのDocumentオブジェクトを作成
texts = text_splitter.create_documents([state_of_the_union])
print(texts[0])  # 最初のチャンクを表示


page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.'


.create_documents を使用して、各ドキュメントに関連するメタデータを出力チャンクに伝播させます：

In [2]:
metadatas = [{"document": 1}, {"document": 2}]
documents = text_splitter.create_documents(
    [state_of_the_union, state_of_the_union], metadatas=metadatas
)
print(documents[0])

page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.' metadata={'document': 1}


文字列コンテンツを直接取得したい場合は、.split_text を使用します：

In [7]:
text_splitter.split_text(state_of_the_union)[0]

'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.'