# 模块1：RAG（检索增强生成）

本模块使用 Amazon Bedrock 嵌入模型和 TiDB Cloud Starter 向量检索来获取数据，然后使用大语言模型（LLM）生成问题的答案。

在这里，我们演示了如何使用 `pytidb` 和 TiDB Cloud Starter 轻松构建 RAG 应用。

> **注意：**
>
> - 我们已经在环境参数中设置了 `SERVERLESS_CLUSTER_HOST`、`SERVERLESS_CLUSTER_PORT`、`SERVERLESS_CLUSTER_USERNAME`、`SERVERLESS_CLUSTER_PASSWORD` 和 `SERVERLESS_CLUSTER_DATABASE_NAME`。
> - 我们还为本实验授予了 Amazon Bedrock 的使用权限。如果你想在 TiDB Labs 平台之外使用此代码片段，请提前进行相关设置。

## 安装依赖项

点击单元格左侧的三角形运行按钮以执行代码。

In [None]:
%pip install -q \
    pytidb==0.0.10.dev1 \
    boto3==1.38.23 \
    litellm \
    pandas

## 初始化数据库客户端

In [None]:
import os

from litellm import completion
from typing import Optional, Any
from pytidb import TiDBClient
from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction

db = TiDBClient.connect(
    host=os.getenv("SERVERLESS_CLUSTER_HOST"),
    port=int(os.getenv("SERVERLESS_CLUSTER_PORT")),
    username=os.getenv("SERVERLESS_CLUSTER_USERNAME"),
    password=os.getenv("SERVERLESS_CLUSTER_PASSWORD"),
    database=os.getenv("SERVERLESS_CLUSTER_DATABASE_NAME"),
    enable_ssl=True,
)

embedding_model = "bedrock/amazon.titan-embed-text-v2:0"

text_embedding_function = EmbeddingFunction(
    embedding_model,
    timeout=60
)

## 准备上下文

在本例中，上下文即为文档，使用 openai 嵌入模型获取文档的向量，并将其存储到 TiDB 中。

In [None]:
table_name = "documents"
class Document(TableModel, table=True):
    __tablename__ = table_name
    __table_args__ = {"extend_existing": True}
    id: int | None = Field(default=None, primary_key=True)
    text: str = Field(max_length=1024)
    embedding: Optional[Any] = text_embedding_function.VectorField(
        source_field="text",
    )

documents = [
    Document(id=1, text="TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads."),
    Document(id=2, text="TiFlash is the key component that makes TiDB essentially an Hybrid Transactional/Analytical Processing (HTAP) database. As a columnar storage extension of TiKV, TiFlash provides both good isolation level and strong consistency guarantee."),
    Document(id=3, text="TiKV is a distributed and transactional key-value database, which provides transactional APIs with ACID compliance. With the implementation of the Raft consensus algorithm and consensus state stored in RocksDB, TiKV guarantees data consistency between multiple replicas and high availability. "),
]

table = db.create_table(schema=Document, if_exists="overwrite")
table.bulk_insert(documents)

## 通过向量余弦距离检索

通过比较问题和文档的向量，从 TiDB 中获取相关文档

In [None]:
question = "what is TiKV?"

results = table.search(question).limit(1)
results.to_pandas()

## 生成答案

In [None]:
from litellm import completion

llm_model = "bedrock/us.amazon.nova-lite-v1:0"

messages = [
    {"role": "system", "content": f"Please carefully answer the question by {str(results)}"},
    {"role": "user", "content": question}
]

llm_response = completion(
    model=llm_model,
    messages=messages,
)

print(llm_response.choices[0].message.content)