forked from langchain-ai/langchain
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add baidu cloud vectorsearch document (langchain-ai#12928)
**Description:** Add BaiduCloud VectorSearch document with implement of BESVectorSearch in langchain vectorstores --------- Co-authored-by: wemysschen <root@icoding-cwx.bcc-szzj.baidu.com>
- Loading branch information
Showing
2 changed files
with
161 additions
and
1 deletion.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
160 changes: 160 additions & 0 deletions
160
docs/docs/integrations/vectorstores/baiducloud_vector_search.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Biadu Cloud ElasticSearch VectorSearch\n", | ||
"\n", | ||
">[Baidu Cloud VectorSearch](https://cloud.baidu.com/doc/BES/index.html?from=productToDoc) is a fully managed, enterprise-level distributed search and analysis service which is 100% compatible to open source. Baidu Cloud VectorSearch provides low-cost, high-performance, and reliable retrieval and analysis platform level product services for structured/unstructured data. As a vector database , it supports multiple index types and similarity distance methods. \n", | ||
"\n", | ||
">`Baidu Cloud ElasticSearch` provides a privilege management mechanism, for you to configure the cluster privileges freely, so as to further ensure data security.\n", | ||
"\n", | ||
"This notebook shows how to use functionality related to the `Baidu Cloud ElasticSearch VectorStore`.\n", | ||
"To run, you should have an [Baidu Cloud ElasticSearch](https://cloud.baidu.com/product/bes.html) instance up and running:\n", | ||
"\n", | ||
"Read the [help document](https://cloud.baidu.com/doc/BES/s/8llyn0hh4 ) to quickly familiarize and configure Baidu Cloud ElasticSearch instance." | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"After the instance is up and running, follow these steps to split documents, get embeddings, connect to the baidu cloud elasticsearch instance, index documents, and perform vector retrieval." | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We need to install the following Python packages first." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#!pip install elasticsearch == 7.11.0" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"First, we want to use `QianfanEmbeddings` so we have to get the Qianfan AK and SK. Details for QianFan is related to [Baidu Qianfan Workshop](https://cloud.baidu.com/product/wenxinworkshop)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import getpass\n", | ||
"\n", | ||
"os.environ['QIANFAN_AK'] = getpass.getpass(\"Your Qianfan AK:\")\n", | ||
"os.environ['QIANFAN_SK'] = getpass.getpass(\"Your Qianfan SK:\")" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Secondly, split documents and get embeddings." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.document_loaders import TextLoader\n", | ||
"\n", | ||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n", | ||
"documents = loader.load()\n", | ||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", | ||
"docs = text_splitter.split_documents(documents)\n", | ||
"\n", | ||
"from langchain.embeddings import QianfanEmbeddingsEndpoint\n", | ||
"embeddings = QianfanEmbeddingsEndpoint()" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Then, create a Baidu ElasticeSearch accessable instance." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Create a bes instance and index docs.\n", | ||
"from langchain.vectorstores import BESVectorStore\n", | ||
"bes = BESVectorStore.from_documents(\n", | ||
" documents=docs, embedding=embeddings, bes_url=\"your bes cluster url\", index_name=\"your vector index\"\n", | ||
")\n", | ||
"bes.client.indices.refresh(index=\"your vector index\")" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Finally, Query and retrive data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"query = \"What did the president say about Ketanji Brown Jackson\"\n", | ||
"docs = bes.similarity_search(query)\n", | ||
"print(docs[0].page_content)" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Please feel free to contact <liuboyao@baidu.com> if you encounter any problems during use, and we will do our best to support you." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"name": "python", | ||
"version": "3.9.17" | ||
}, | ||
"orig_nbformat": 4, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |