# 使用 HyDE 改进文档索引
本笔记本介绍如何使用假设文档嵌入（HyDE），正如[这篇论文](https://arxiv.org/abs/2212.10496)中所描述的。

从高层次来看，HyDE 是一种嵌入技术，它接收查询，生成一个假设答案，然后对生成的文档进行嵌入，并将其用作最终示例。

因此，要使用 HyDE，我们需要提供一个基础嵌入模型，以及一个可用于生成这些文档的 LLMChain。默认情况下，HyDE 类附带一些默认提示（有关更多详细信息，请参阅论文），但我们也可以创建自己的提示。

In [1]:
from langchain.chains import HypotheticalDocumentEmbedder, LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI, OpenAIEmbeddings

In [2]:
base_embeddings = OpenAIEmbeddings()
llm = OpenAI()

In [3]:
# Load with `web_search` prompt
embeddings = HypotheticalDocumentEmbedder.from_llm(llm, base_embeddings, "web_search")

In [4]:
# Now we can use it as any embedding class!
result = embeddings.embed_query("Where is the Taj Mahal?")

## 多重生成
我们还可以生成多个文档，然后组合这些文档的嵌入。默认情况下，我们通过取平均值来组合它们。我们可以通过更改用于生成文档的 LLM 来返回多个内容来实现这一点。

In [5]:
multi_llm = OpenAI(n=4, best_of=4)

In [6]:
embeddings = HypotheticalDocumentEmbedder.from_llm(
    multi_llm, base_embeddings, "web_search"
)

In [7]:
result = embeddings.embed_query("Where is the Taj Mahal?")

## 使用我们自己的提示
除了使用预配置的提示外，我们还可以轻松构建自己的提示，并在生成文档的 LLMChain 中使用它们。如果我们知道查询将在哪个领域中进行，这会很有用，因为我们可以调整提示以生成更类似于该领域的文本。

在下面的示例中，让我们将其调整为生成关于国情咨文的文本（因为我们将在下一个示例中使用它）。

In [8]:
prompt_template = """Please answer the user's question about the most recent state of the union address
Question: {question}
Answer:"""
prompt = PromptTemplate(input_variables=["question"], template=prompt_template)
llm_chain = LLMChain(llm=llm, prompt=prompt)

In [9]:
embeddings = HypotheticalDocumentEmbedder(
    llm_chain=llm_chain, base_embeddings=base_embeddings
)

In [10]:
result = embeddings.embed_query(
    "What did the president say about Ketanji Brown Jackson"
)

## 使用 HyDE
现在我们有了 HyDE，我们可以像使用任何其他嵌入类一样使用它！以下是在国情咨文示例中使用它来查找相似段落。

In [11]:
from langchain_chroma import Chroma
from langchain_text_splitters import CharacterTextSplitter

with open("../../state_of_the_union.txt") as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

In [12]:
docsearch = Chroma.from_texts(texts, embeddings)

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.


In [13]:
print(docs[0].page_content)

In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. 

We cannot let this happen. 

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
