<a target="_blank" href="https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/callbacks/uptrain.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在Colab中打开"/>
</a>

# UpTrain

> UpTrain [[GitHub](https://github.com/uptrain-ai/uptrain) || [网站](https://uptrain.ai/) || [文档](https://docs.uptrain.ai/getting-started/introduction)] 是一个开源平台，用于评估和改进LLM应用程序。它为20多个预配置检查（涵盖语言、代码、嵌入用例）提供评分，对失败案例进行根本原因分析，并提供解决这些问题的指导。

## UpTrain 回调处理程序

本笔记本展示了 UpTrain 回调处理程序如何无缝集成到您的流水线中，促进多样化的评估。我们选择了一些我们认为适合评估链的评估方法。这些评估会自动运行，结果显示在输出中。有关 UpTrain 评估的更多详情，可以在[这里](https://github.com/uptrain-ai/uptrain?tab=readme-ov-file#pre-built-evaluations-we-offer-)找到。

为了演示，特别选取了 LangChain 的几个检索器：

### 1. **基础 RAG**：
RAG 在检索上下文和生成响应中起着至关重要的作用。为确保其性能和响应质量，我们进行以下评估：

- **[上下文相关性](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**：确定从查询中提取的上下文是否与响应相关。
- **[事实准确性](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**：评估 LLM 是否在产生幻觉或提供不正确的信息。
- **[响应完整性](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**：检查响应是否包含查询请求的所有信息。

### 2. **多查询生成**：
MultiQueryRetriever 创建与原始问题意思相似的多个问题变体。考虑到其复杂性，我们包括上述评估并添加：

- **[多查询准确性](https://docs.uptrain.ai/predefined-evaluations/query-quality/multi-query-accuracy)**：确保生成的多查询与原始查询含义相同。

### 3. **上下文压缩和重排序**：
重新排序涉及根据与查询的相关性重新排列节点，并选择前 n 个节点。由于完成重新排序后节点数量可能减少，我们执行以下评估：

- **[上下文重排序](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)**：检查重新排序节点的顺序是否比原始顺序更相关于查询。
- **[上下文简明性](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)**：检查减少后的节点数量是否仍然提供所有必要的信息。

这些评估共同确保了链中 RAG、MultiQueryRetriever 和重排序过程的健壮性和有效性。

## 安装依赖项

In [1]:
%pip install -qU langchain langchain_openai langchain-community uptrain faiss-cpu flashrank

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0mNote: you may need to restart the kernel to use updated packages.


注意：如果您想使用支持 GPU 的库版本，您也可以安装 `faiss-gpu` 而不是 `faiss-cpu`。

## 导入库

In [2]:
from getpass import getpass

from langchain.chains import RetrievalQA
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.callbacks.uptrain_callback import UpTrainCallbackHandler
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import (
    RecursiveCharacterTextSplitter,
)

## 加载文档

In [3]:
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()

## 将文档分割成块

In [4]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunks = text_splitter.split_documents(documents)

## 创建检索器

In [5]:
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()

## 定义 LLM

In [6]:
llm = ChatOpenAI(temperature=0, model="gpt-4")

## 设置

UpTrain 提供给您：
1. 具有高级下钻和筛选选项的仪表板
1. 对失败案例的共同主题和洞察
1. 生产数据的可观察性和实时监控
1. 通过与您的 CI/CD 管道无缝集成进行回归测试

您可以在使用 UpTrain 进行评估时选择以下选项：
### 1. **UpTrain 的开源软件 (OSS)**：
您可以使用开源评估服务来评估您的模型。在这种情况下，您需要提供 OpenAI API 密钥。UpTrain 使用 GPT 模型来评估 LLM 生成的响应。您可以在[这里](https://platform.openai.com/account/api-keys)获取您的密钥。

为了在 UpTrain 仪表板中查看您的评估，您需要通过在终端中运行以下命令来设置它：

```bash
git clone https://github.com/uptrain-ai/uptrain
cd uptrain
bash run_uptrain.sh
```

这将在您的本地机器上启动 UpTrain 仪表板。您可以通过 `http://localhost:3000/dashboard` 访问它。

参数：
- key_type="openai"
- api_key="OPENAI_API_KEY"
- project_name="PROJECT_NAME"


### 2. **UpTrain 托管服务和仪表板**：
或者，您可以使用 UpTrain 的托管服务来评估您的模型。您可以在[这里](https://uptrain.ai/)创建一个免费的 UpTrain 账户并获取免费试用积分。如果您想要更多试用积分，[请在这里预约与 UpTrain 维护者的通话](https://calendly.com/uptrain-sourabh/30min)。

使用托管服务的好处是：
1. 无需在本地机器上设置 UpTrain 仪表板。
1. 可以访问许多 LLM，无需它们的 API 密钥。

一旦您执行了评估，您可以在 `https://dashboard.uptrain.ai/dashboard` 的 UpTrain 仪表板中查看它们。

参数：
- key_type="uptrain"
- api_key="UPTRAIN_API_KEY"
- project_name="PROJECT_NAME"


**注意：** `project_name` 将是执行的评估在 UpTrain 仪表板中显示的项目名称。

## 设置 API 密钥

笔记本将提示您输入 API 密钥。您可以通过更改下面单元格中的 `key_type` 参数在 OpenAI API 密钥或 UpTrain API 密钥之间选择。

In [None]:
KEY_TYPE = "openai"  # 或 "uptrain"
API_KEY = getpass()

# 1. 基础 RAG

UpTrain 回调处理程序将自动捕获查询、上下文和生成的响应，并对响应运行以下三个评估*（等级从 0 到 1）*：
- **[上下文相关性](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)**：检查从查询中提取的上下文是否与响应相关。
- **[事实准确性](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)**：检查响应的事实准确性。
- **[响应完整性](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)**：检查响应是否包含查询所要求的所有信息。

In [None]:
# 创建 RAG 提示
template = """仅根据以下上下文回答问题，上下文可能包括文本和表格：
{context}
问题：{question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)

# 创建链
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt_text
    | llm
    | StrOutputParser()
)

# 创建 uptrain 回调处理程序
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# 用查询调用链
query = "总统关于Ketanji Brown Jackson说了什么"
docs = chain.invoke(query, config=config)

[32m2024-04-17 17:03:44.969[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:05.809[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m



Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that she is a former top litigator in private practice, a former federal public defender, and comes from a family of public school educators and police officers. He described her as a consensus builder and noted that since her nomination, she has received a broad range of support from various groups, including the Fraternal Order of Police and former judges appointed by both Democrats and Republicans.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0


# 2. 多查询生成

**MultiQueryRetriever** 用于解决 RAG 管道可能无法基于查询返回最佳文档集的问题。它生成多个与原始查询意思相同的查询，然后为每个查询获取文档。

为了评估这个检索器，UpTrain 将运行以下评估：
- **[多查询准确性](https://docs.uptrain.ai/predefined-evaluations/query-quality/multi-query-accuracy)**：检查生成的多查询是否与原始查询意思相同。

In [None]:
# 创建检索器
multi_query_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)

# 创建 uptrain 回调
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# 创建 RAG 提示
template = """仅根据以下上下文回答问题，上下文可能包括文本和表格：
{context}
问题：{question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)

chain = (
    {"context": multi_query_retriever, "question": RunnablePassthrough()}
    | rag_prompt_text
    | llm
    | StrOutputParser()
)

# 用查询调用链
question = "总统关于Ketanji Brown Jackson说了什么"
docs = chain.invoke(question, config=config)

[32m2024-04-17 17:04:10.675[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:16.804[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m



Question: What did the president say about Ketanji Brown Jackson
Multi Queries:
  - How did the president comment on Ketanji Brown Jackson?
  - What were the president's remarks regarding Ketanji Brown Jackson?
  - What statements has the president made about Ketanji Brown Jackson?

Multi Query Accuracy Score: 0.5


[32m2024-04-17 17:04:22.027[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:44.033[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m



Question: What did the president say about Ketanji Brown Jackson
Response: The president mentioned that he had nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence. He also mentioned that since her nomination, she has received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 1.0


# 3. 上下文压缩和重排序

重排序过程涉及根据与查询的相关性重新排列节点并选择前 n 个节点。由于重排序完成后节点数量可能减少，我们执行以下评估：
- **[上下文重排序](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)**：检查重新排序的节点顺序是否比原始顺序与查询更相关。
- **[上下文简明性](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)**：检查减少后的节点数量是否仍然提供所有必需的信息。

In [None]:
# 创建检索器
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

# 创建链
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)

# 创建 uptrain 回调
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}

# 用查询调用链
query = "总统关于Ketanji Brown Jackson说了什么"
result = chain.invoke(query, config=config)

[32m2024-04-17 17:04:46.462[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:04:53.561[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m



Question: What did the president say about Ketanji Brown Jackson

Context Conciseness Score: 0.0
Context Reranking Score: 1.0


[32m2024-04-17 17:04:56.947[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m378[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2024-04-17 17:05:16.551[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m367[0m - [1mLocal server not running, start the server to log data and visualize in the dashboard![0m



Question: What did the president say about Ketanji Brown Jackson
Response: The President mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago. He described her as one of the nation's top legal minds who will continue Justice Breyer’s legacy of excellence.

Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5


# UpTrain的仪表板和洞察

这是一个展示仪表板和洞察的简短视频：

![langchain_uptrain.gif](https://uptrain-assets.s3.ap-south-1.amazonaws.com/images/langchain/langchain_uptrain.gif)