# 使用向量存储构建 Agent

本 notebook 展示了如何使用向量存储来构建一个 Agent。

这是一个有趣的用例，因为它结合了：
1. Agent 的推理能力
2. 向量存储作为工具

这个组合使我们能够构建一个可以自主地与向量数据库进行交互的 Agent。

In [None]:
from operator import itemgetter
from typing import List, Tuple

from langchain import OpenAI
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.docstore.document import Document
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.base import VectorStore
from langchain.vectorstores import Chroma
from langchain.agents.agent_toolkits import VectorStoreInfo, VectorStoreToolkit

## 设置向量存储

为了进行演示，我们将使用一些随机生成的州数据。在实际应用中，你可以使用自己的数据。

In [17]:
# 这是我们想要做问答的数据
state_text = """
亚利桑那(Arizona)以其山区景观而闻名，比如大峡谷和
塞多纳的红岩地貌。该州大部分地区属于沙漠气候。

加利福尼亚(California)有着漫长的太平洋海岸线，还有旧金山金门大桥和
洛杉矶好莱坞等知名地标。该州以其科技产业和娱乐业而闻名。

德克萨斯(Texas)是美国第二大州。它以牛仔文化和
石油工业而闻名。该州最大的城市是休斯顿。

纽约(New York)包括美国最大的城市纽约市。
这个州以华尔街、百老汇和中央公园等地标闻名于世。

佛罗里达(Florida)以其温暖的天气和海滩而闻名。该州
也是许多主题公园的所在地，如迪士尼世界和环球影城。

夏威夷(Hawaii)是美国唯一完全由岛屿组成的州。它以火山、
热带海滩和独特的文化而闻名。
"""

# 将文本分割成较小的块
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.create_documents([state_text])

# 使用 Chroma 向量存储和 OpenAI 嵌入
embeddings = OpenAIEmbeddings()
store = Chroma.from_documents(docs, embeddings, collection_name="state-info")

In [18]:
from langchain_community.document_loaders import TextLoader

# 加载文档
loader = TextLoader(doc_path)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# 创建向量存储
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings, collection_name="state-of-union")

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.


In [4]:
# 创建问答检索器
state_of_union = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()
)

In [5]:
from langchain_community.document_loaders import WebBaseLoader

In [6]:
# 从网页加载文档
loader = WebBaseLoader("https://beta.ruff.rs/docs/faq/")

In [7]:
# 加载并处理文档
docs = loader.load()
ruff_texts = text_splitter.split_documents(docs)
ruff_db = Chroma.from_documents(ruff_texts, embeddings, collection_name="ruff")
ruff = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=ruff_db.as_retriever()
)

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.


## 创建 Agent

In [None]:
# 导入常用组件
from langchain.agents import AgentType, Tool, initialize_agent
from langchain_openai import OpenAI

In [None]:
# 定义工具列表
tools = [
    Tool(
        name="国情咨文问答系统",
        func=state_of_union.run,
        description="用于回答关于最近国情咨文演讲的问题。输入应该是一个完整的问题。",
    ),
    Tool(
        name="Ruff问答系统",
        func=ruff.run,
        description="用于回答关于ruff(一个Python代码检查工具)的问题。输入应该是一个完整的问题。",
    ),
]

In [None]:
# 构建 agent。这里我们使用默认的 agent 类型。
# 完整的选项列表请参考文档。
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [None]:
# 测试国情咨文相关问题
agent.run(
    "拜登在国情咨文演讲中如何评价Ketanji Brown Jackson？"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.
Action: State of Union QA System
Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?[0m
Observation: [36;1m[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.[0m

[1m> Finished chain.[0m


"Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence."

In [None]:
# 测试 Ruff 相关问题
agent.run("为什么要使用ruff而不是flake8？")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out the advantages of using ruff over flake8
Action: Ruff QA System
Action Input: What are the advantages of using ruff over flake8?[0m
Observation: [33;1m[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, 

'Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'

## 将 Agent 仅作为路由器使用

如果你打算仅将 agent 用作路由器并想直接返回 RetrievalQAChain 的结果，你也可以设置 `return_direct=True`。

注意在上面的例子中，agent 在查询 RetrievalQAChain 后还做了一些额外的工作。你可以避免这种情况并直接返回结果。

In [None]:
# 定义工具列表（带有 return_direct=True）
tools = [
    Tool(
        name="国情咨文问答系统",
        func=state_of_union.run,
        description="用于回答关于最近国情咨文演讲的问题。输入应该是一个完整的问题。",
        return_direct=True,
    ),
    Tool(
        name="Ruff问答系统",
        func=ruff.run,
        description="用于回答关于ruff(一个Python代码检查工具)的问题。输入应该是一个完整的问题。",
        return_direct=True,
    ),
]

agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [None]:
# 测试国情咨文相关问题
agent.run(
    "拜登在国情咨文演讲中如何评价Ketanji Brown Jackson？"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.
Action: State of Union QA System
Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?[0m
Observation: [36;1m[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


" Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence."

In [None]:
# 测试 Ruff 相关问题
agent.run("为什么要使用ruff而不是flake8？")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out the advantages of using ruff over flake8
Action: Ruff QA System
Action Input: What are the advantages of using ruff over flake8?[0m
Observation: [33;1m[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


' Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'

## 多跳向量存储推理

由于向量存储可以轻松地作为工具在 agent 中使用，因此使用现有的 agent 框架来回答依赖于向量存储的多跳问题也很简单。

In [None]:
# 定义工具列表
tools = [
    Tool(
        name="国情咨文问答系统",
        func=state_of_union.run,
        description="用于回答关于最近国情咨文演讲的问题。输入应该是一个完整的问题，避免引用之前对话中的任何模糊代词。",
    ),
    Tool(
        name="Ruff问答系统",
        func=ruff.run,
        description="用于回答关于ruff(一个Python代码检查工具)的问题。输入应该是一个完整的问题，避免引用之前对话中的任何模糊代词。",
    ),
]

# 初始化 agent
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

# 测试多跳问题
agent.run(
    "Ruff使用什么工具来处理Jupyter Notebook？总统在国情咨文中有提到这个工具吗？"
)