# Clarifai

>[Clarifai](https://www.clarifai.com/) 是一个 AI 平台，提供从数据探索、数据标注、模型训练、评估到推理的全方位 AI 生命周期服务。在上传输入后，Clarifai 应用可用作向量数据库。

本 Notebook 展示了如何使用与 `Clarifai` 向量数据库相关的各项功能。通过示例演示了文本语义搜索功能。Clarifai 还支持图像、视频帧的语义搜索以及局部搜索（参见 [Rank](https://docs.clarifai.com/api-guide/search/rank)）和属性搜索（参见 [Filter](https://docs.clarifai.com/api-guide/search/filter)）。

要使用 Clarifai，您必须拥有一个账户和一个 Personal Access Token (PAT) 密钥。
请在此处（[Check here](https://clarifai.com/settings/security)）获取或创建 PAT。

# 依赖项

In [None]:
# Install required dependencies
%pip install --upgrade --quiet  clarifai langchain-community

#Imports
在这里，我们将设置个人访问令牌。您可以在平台上的 settings/security 下找到您的 PAT。

In [1]:
# Please login and get your API key from  https://clarifai.com/settings/security
from getpass import getpass

CLARIFAI_PAT = getpass()

 ········


In [3]:
# Import the required modules
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Clarifai
from langchain_text_splitters import CharacterTextSplitter

# 配置
配置用于上传文本数据的用户 ID 和应用程序 ID。注意：在创建该应用程序时，请选择一个合适的基线工作流来索引您的文本文档，例如“语言理解”工作流。

您需要先在 [Clarifai](https://clarifai.com/login) 上创建账户，然后创建一个应用程序。

In [24]:
USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 2

## 从文本创建

从文本列表中创建 Clarifai 向量存储。此部分会将每段文本及其各自的元数据上传到 Clarifai 应用程序。然后，该 Clarifai 应用程序可用于语义搜索以查找相关文本。

In [16]:
texts = [
    "I really enjoy spending time with you",
    "I hate spending time with my dog",
    "I want to go for a run",
    "I went to the movies yesterday",
    "I love playing soccer with my friends",
]

metadatas = [
    {"id": i, "text": text, "source": "book 1", "category": ["books", "modern"]}
    for i, text in enumerate(texts)
]

另外，您也可以选择为输入项提供自定义 ID。

In [17]:
idlist = ["text1", "text2", "text3", "text4", "text5"]
metadatas = [
    {"id": idlist[i], "text": text, "source": "book 1", "category": ["books", "modern"]}
    for i, text in enumerate(texts)
]

In [27]:
# There is an option to initialize clarifai vector store with pat as argument!
clarifai_vector_db = Clarifai(
    user_id=USER_ID,
    app_id=APP_ID,
    number_of_docs=NUMBER_OF_DOCS,
)

将数据上传到 clarifai 应用。

In [None]:
# upload with metadata and custom input ids.
response = clarifai_vector_db.add_texts(texts=texts, ids=idlist, metadatas=metadatas)

# upload without metadata (Not recommended)- Since you will not be able to perform Search operation with respect to metadata.
# custom input_id (optional)
response = clarifai_vector_db.add_texts(texts=texts)

您可以通过以下方式创建一个 Clarifai vector DB 存储，并将所有输入直接摄取到您的应用程序中：

In [None]:
clarifai_vector_db = Clarifai.from_texts(
    user_id=USER_ID,
    app_id=APP_ID,
    texts=texts,
    metadatas=metadatas,
)

使用相似性搜索功能搜索相似文本。

In [31]:
docs = clarifai_vector_db.similarity_search("I would like to see you")
docs

[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 'text1', 'source': 'book 1', 'category': ['books', 'modern']})]

此外，您还可以根据元数据过滤搜索结果。

In [29]:
# There is lots powerful filtering you can do within an app by leveraging metadata filters.
# This one will limit the similarity query to only the texts that have key of "source" matching value of "book 1"
book1_similar_docs = clarifai_vector_db.similarity_search(
    "I would love to see you", filter={"source": "book 1"}
)

# you can also use lists in the input's metadata and then select things that match an item in the list. This is useful for categories like below:
book_category_similar_docs = clarifai_vector_db.similarity_search(
    "I would love to see you", filter={"category": ["books"]}
)

## 从文档创建

从文档列表中创建 Clarifai 向量存储。本节将上传每个文档及其各自的元数据到 Clarifai Application。然后，Clarifai Application 可用于语义搜索以查找相关文档。

In [None]:
loader = TextLoader("your_local_file_path.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

In [10]:
USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 4

创建一个 clarifai 向量数据库类，并将所有文档摄入 clarifai App。

In [None]:
clarifai_vector_db = Clarifai.from_documents(
    user_id=USER_ID,
    app_id=APP_ID,
    documents=docs,
    number_of_docs=NUMBER_OF_DOCS,
)

In [None]:
docs = clarifai_vector_db.similarity_search("Texts related to population")
docs

## 从现有应用导入

在 Clarifai 中，我们提供了强大的工具，可以通过 API 或 UI 向应用程序（本质上是项目）添加数据。大多数用户在与 LangChain 交互之前就已经完成了这一步，因此本示例将利用现有应用程序中的数据执行搜索。请查看我们的 [API 文档](https://docs.clarifai.com/api-guide/data/create-get-update-delete) 和 [UI 文档](https://docs.clarifai.com/portal-guide/data)。然后，Clarifai 应用程序可用于语义搜索，以查找相关文档。

In [7]:
USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 4

In [9]:
clarifai_vector_db = Clarifai(
    user_id=USER_ID,
    app_id=APP_ID,
    number_of_docs=NUMBER_OF_DOCS,
)

In [None]:
docs = clarifai_vector_db.similarity_search(
    "Texts related to ammuniction and president wilson"
)

In [51]:
docs[0].page_content

"President Wilson, generally acclaimed as the leader of the world's democracies,\nphrased for civilization the arguments against autocracy in the great peace conference\nafter the war. The President headed the American delegation to that conclave of world\nre-construction. With him as delegates to the conference were Robert Lansing, Secretary\nof State; Henry White, former Ambassador to France and Italy; Edward M. House and\nGeneral Tasker H. Bliss.\nRepresenting American Labor at the International Labor conference held in Paris\nsimultaneously with the Peace Conference were Samuel Gompers, president of the\nAmerican Federation of Labor; William Green, secretary-treasurer of the United Mine\nWorkers of America; John R. Alpine, president of the Plumbers' Union; James Duncan,\npresident of the International Association of Granite Cutters; Frank Duffy, president of\nthe United Brotherhood of Carpenters and Joiners, and Frank Morrison, secretary of the\nAmerican Federation of Labor.\nEstim