# 如何进行“自查询”检索

:::info 预备知识

本指南假定您熟悉以下概念：

- [检索器](/docs/concepts/retrievers)
- [向量存储](/docs/concepts/vectorstores)

:::

自查询检索器顾名思义就是具备自我查询能力的检索器。具体来说，给定任意自然语言查询，该检索器会使用一个大语言模型（LLM）来生成结构化查询，并将该结构化查询应用于其底层的向量存储。这使得检索器不仅可以使用用户输入的查询与存储文档内容进行语义相似性比较，还可以从用户查询中提取出针对存储文档元数据的过滤条件并执行这些过滤。

![](../../static/img/self_querying.jpeg)

:::info

有关支持自查询功能的向量存储的文档，请参阅[集成](/docs/integrations/retrievers/self_query)。

:::

## 开始使用

为了演示方便，我们将使用一个内存中的、未经优化的向量存储。在实际构建应用时，请将其替换为受支持的、适合生产的向量存储。

使用自查询检索器需要安装 [`peggy`](https://www.npmjs.com/package/peggy) 包作为对等依赖项，在本示例中我们还将使用 OpenAI：

```{=mdx}
import Npm2Yarn from '@theme/Npm2Yarn';

<Npm2Yarn>
  peggy @langchain/openai @langchain/core
</Npm2Yarn>
```

我们准备了一组包含电影摘要的小型演示文档：

In [None]:
import "peggy";
import { Document } from "@langchain/core/documents";

/**
 * First, we create a bunch of documents. You can load your own documents here instead.
 * Each document has a pageContent and a metadata field. Make sure your metadata matches the AttributeInfo below.
 */
const docs = [
  new Document({
    pageContent:
      "A bunch of scientists bring back dinosaurs and mayhem breaks loose",
    metadata: { year: 1993, rating: 7.7, genre: "science fiction", length: 122 },
  }),
  new Document({
    pageContent:
      "Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
    metadata: { year: 2010, director: "Christopher Nolan", rating: 8.2, length: 148 },
  }),
  new Document({
    pageContent:
      "A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
    metadata: { year: 2006, director: "Satoshi Kon", rating: 8.6 },
  }),
  new Document({
    pageContent:
      "A bunch of normal-sized women are supremely wholesome and some men pine after them",
    metadata: { year: 2019, director: "Greta Gerwig", rating: 8.3, length: 135 },
  }),
  new Document({
    pageContent: "Toys come alive and have a blast doing so",
    metadata: { year: 1995, genre: "animated", length: 77 },
  }),
  new Document({
    pageContent: "Three men walk into the Zone, three men walk out of the Zone",
    metadata: {
      year: 1979,
      director: "Andrei Tarkovsky",
      genre: "science fiction",
      rating: 9.9,
    },
  }),
];

### 创建我们自己的查询检索器

现在我们可以实例化我们的检索器。为此，我们需要预先提供一些关于文档支持的元数据字段的信息，以及对文档内容的简要描述。

In [2]:
import { OpenAIEmbeddings, OpenAI } from "@langchain/openai";
import { FunctionalTranslator } from "@langchain/core/structured_query";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { SelfQueryRetriever } from "langchain/retrievers/self_query";
import type { AttributeInfo } from "langchain/chains/query_constructor";

/**
 * We define the attributes we want to be able to query on.
 * in this case, we want to be able to query on the genre, year, director, rating, and length of the movie.
 * We also provide a description of each attribute and the type of the attribute.
 * This is used to generate the query prompts.
 */
const attributeInfo: AttributeInfo[] = [
  {
    name: "genre",
    description: "The genre of the movie",
    type: "string or array of strings",
  },
  {
    name: "year",
    description: "The year the movie was released",
    type: "number",
  },
  {
    name: "director",
    description: "The director of the movie",
    type: "string",
  },
  {
    name: "rating",
    description: "The rating of the movie (1-10)",
    type: "number",
  },
  {
    name: "length",
    description: "The length of the movie in minutes",
    type: "number",
  },
];



/**
 * Next, we instantiate a vector store. This is where we store the embeddings of the documents.
 * We also need to provide an embeddings object. This is used to embed the documents.
 */
const embeddings = new OpenAIEmbeddings();
const llm = new OpenAI();
const documentContents = "Brief summary of a movie";
const vectorStore = await MemoryVectorStore.fromDocuments(docs, embeddings);
const selfQueryRetriever = SelfQueryRetriever.fromLLM({
  llm,
  vectorStore,
  documentContents,
  attributeInfo,
  /**
   * We need to use a translator that translates the queries into a
   * filter format that the vector store can understand. We provide a basic translator
   * translator here, but you can create your own translator by extending BaseTranslator
   * abstract class. Note that the vector store needs to support filtering on the metadata
   * attributes you want to query on.
   */
  structuredQueryTranslator: new FunctionalTranslator(),
});

### 实际测试

现在我们实际上可以尝试使用我们的检索器了！

我们可以提出诸如“哪些电影的时长少于90分钟？”或“哪些电影的评分高于8.5分？”之类的问题。
我们还可以问诸如“哪些电影是喜剧或剧情片，并且时长少于90分钟？”这样的问题。
检索器中的转换器会自动将这些问题转换为可用于检索文档的向量存储过滤器。

In [3]:
await selfQueryRetriever.invoke(
  "Which movies are less than 90 minutes?"
);

[
  Document {
    pageContent: [32m"Toys come alive and have a blast doing so"[39m,
    metadata: { year: [33m1995[39m, genre: [32m"animated"[39m, length: [33m77[39m }
  }
]

In [4]:
await selfQueryRetriever.invoke(
  "Which movies are rated higher than 8.5?"
);

[
  Document {
    pageContent: [32m"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception"[39m... 16 more characters,
    metadata: { year: [33m2006[39m, director: [32m"Satoshi Kon"[39m, rating: [33m8.6[39m }
  },
  Document {
    pageContent: [32m"Three men walk into the Zone, three men walk out of the Zone"[39m,
    metadata: {
      year: [33m1979[39m,
      director: [32m"Andrei Tarkovsky"[39m,
      genre: [32m"science fiction"[39m,
      rating: [33m9.9[39m
    }
  }
]

In [5]:
await selfQueryRetriever.invoke(
  "Which movies are directed by Greta Gerwig?"
);

[
  Document {
    pageContent: [32m"A bunch of normal-sized women are supremely wholesome and some men pine after them"[39m,
    metadata: { year: [33m2019[39m, director: [32m"Greta Gerwig"[39m, rating: [33m8.3[39m, length: [33m135[39m }
  }
]

In [6]:
await selfQueryRetriever.invoke(
  "Which movies are either comedy or drama and are less than 90 minutes?"
);

[
  Document {
    pageContent: [32m"Toys come alive and have a blast doing so"[39m,
    metadata: { year: [33m1995[39m, genre: [32m"animated"[39m, length: [33m77[39m }
  }
]

## 下一步
你现在已经了解了如何使用 `SelfQueryRetriever` 根据原始问题生成向量存储过滤器。

接下来，你可以查看当前支持自查询的[向量存储列表](/docs/integrations/retrievers/self_query/)。