# Faiss
> **Facebook AI Similarity Search**

## synchronous 


**[ASYNC版本](https://python.langchain.com/docs/integrations/vectorstores/async_faiss)**

In [2]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader

In [3]:
from langchain.document_loaders import TextLoader

In [4]:
loader = TextLoader('./input/state_of_the_union.txt')

In [5]:
documents = loader.load()

In [6]:
documents.__len__()

1

In [7]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

In [8]:
docs = text_splitter.split_documents(documents)

In [9]:
embeddings = OpenAIEmbeddings()

In [10]:
db = FAISS.from_documents(docs, embeddings)

### 1. 字符串查

In [11]:
query = "What did the president say about Ketanji Brown Jackson"

In [12]:
docs = db.similarity_search(query)

In [13]:
docs.__len__()

4

In [15]:
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


### 2. 字符串查,且返回分数

`similarity_search_with_score`
1. L2 distance : a lower score is better
2. 不仅返回文档,还返回score

In [16]:
docs_and_scores = db.similarity_search_with_score(query)

In [18]:
for doc, score in docs_and_scores:
    print(doc, score)

page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'source': './input/state_of_the_union.txt'} 0.36921751
page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school

### 3. 根据向量查

In [19]:
embedding_vector = embeddings.embed_query(query)

In [20]:
for doc, score in db.similarity_search_with_score_by_vector(embedding_vector):
    print(doc, score)

page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'source': './input/state_of_the_union.txt'} 0.36912858
page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school

### 4. Saving and loading

In [21]:
db.save_local('./faiss_index')

In [22]:
new_db = FAISS.load_local('./faiss_index', embeddings)

### 5. Serializing and De-Serializing to bytes

1. save_local 空间很大, 
2. 序列化接口很小, 如果要将向量库持久化到sql数据库, 这是个不错的选择. 

In [None]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

**序列化**

In [23]:
pkl = db.serialize_to_bytes()

**反序列化**

In [24]:
db = FAISS.deserialize_from_bytes(embeddings=embeddings, serialized=pkl)

### 6. Merging, 合并索引

In [25]:
db1 = FAISS.from_texts(["foo"], embeddings)
db2 = FAISS.from_texts(['bar'], embeddings)

In [26]:
db1.docstore._dict

{'1c3c6c3d-d931-44dc-b147-96cddd259275': Document(page_content='foo')}

In [27]:
db2.docstore._dict

{'3d91c136-36bf-462a-9248-14adec957622': Document(page_content='bar')}

In [28]:
db1.merge_from(db2)

In [29]:
db1.docstore._dict

{'1c3c6c3d-d931-44dc-b147-96cddd259275': Document(page_content='foo'),
 '3d91c136-36bf-462a-9248-14adec957622': Document(page_content='bar')}

### 7. 带过滤器的相似度查找

> 1. fetch_k : 是过滤之前获取的文档数 <br>

In [30]:
from langchain.schema import Document

In [31]:
list_of_documents = [
    Document(page_content="foo", metadata=dict(page=1)),
    Document(page_content="bar", metadata=dict(page=1)),
    Document(page_content="foo", metadata=dict(page=2)),
    Document(page_content="barbar", metadata=dict(page=2)),
    Document(page_content="foo", metadata=dict(page=3)),
    Document(page_content="bar burr", metadata=dict(page=3)),
    Document(page_content="foo", metadata=dict(page=4)),
    Document(page_content="bar bruh", metadata=dict(page=4)),
]

In [52]:
db = FAISS.from_documents(list_of_documents, embeddings)

In [33]:
results_with_scores = db.similarity_search_with_score("foo")

In [34]:
for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")

Content: foo, Metadata: {'page': 1}, Score: 0.0
Content: foo, Metadata: {'page': 2}, Score: 0.0
Content: foo, Metadata: {'page': 3}, Score: 0.0
Content: foo, Metadata: {'page': 4}, Score: 0.0


> Now we make the same query call but we filter for only `page = 1`

In [35]:
results_with_scores = db.similarity_search_with_score("foo", filter=dict(page=1))
for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")

Content: foo, Metadata: {'page': 1}, Score: 1.4206954801920801e-05
Content: bar, Metadata: {'page': 1}, Score: 0.3131061792373657


> Same thing can be done with the `max_marginal_relevance_search` as well.

In [36]:
results = db.max_marginal_relevance_search("foo", filter=dict(page=1))
for doc in results:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")

Content: foo, Metadata: {'page': 1}
Content: bar, Metadata: {'page': 1}


> `fetch_k` parameter is the number of documents that will be fetched before filtering<br>
> `fetch_k` parameter 是在过滤之前抓取的文档数量

In [37]:
results = db.similarity_search("foo", filter=dict(page=1), k=1, fetch_k=4)
for doc in results:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")

Content: foo, Metadata: {'page': 1}


In [53]:
db.index_to_docstore_id

{0: 'a992967c-761b-4fe6-9f64-5aa7ed88d545',
 1: 'ba1094b9-b7c8-437c-a1db-33c1a39fdfe1',
 2: 'b31d8eac-3d9c-4d26-8244-2272a14b9122',
 3: 'a8dc35f4-6334-42ae-b76d-87fb5a123674',
 4: '7a95ca75-8b31-4744-8a7b-f41915b09c6e',
 5: 'a3f06aaf-096b-48fe-ba16-36a3da942f65',
 6: '9aa83411-5832-41c1-8fff-091e13f2cd5e',
 7: '3d8263fd-7c18-444c-b5e4-b3d73b5abf91'}

In [54]:
db.index_to_docstore_id[0]

'a992967c-761b-4fe6-9f64-5aa7ed88d545'

In [55]:
db.delete([db.index_to_docstore_id[0]])

True

In [56]:
0 in db.index_to_docstore_id

True

In [57]:
db.index_to_docstore_id

{0: 'ba1094b9-b7c8-437c-a1db-33c1a39fdfe1',
 1: 'b31d8eac-3d9c-4d26-8244-2272a14b9122',
 2: 'a8dc35f4-6334-42ae-b76d-87fb5a123674',
 3: '7a95ca75-8b31-4744-8a7b-f41915b09c6e',
 4: 'a3f06aaf-096b-48fe-ba16-36a3da942f65',
 5: '9aa83411-5832-41c1-8fff-091e13f2cd5e',
 6: '3d8263fd-7c18-444c-b5e4-b3d73b5abf91'}

> 少了一个 `a992967c-761b-4fe6-9f64-5aa7ed88d545`

In [58]:
db.add_documents?

[0;31mSignature:[0m [0mdb[0m[0;34m.[0m[0madd_documents[0m[0;34m([0m[0mdocuments[0m[0;34m:[0m [0;34m'List[Document]'[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m:[0m [0;34m'Any'[0m[0;34m)[0m [0;34m->[0m [0;34m'List[str]'[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Run more documents through the embeddings and add to the vectorstore.

Args:
    documents (List[Document]: Documents to add to the vectorstore.

Returns:
    List[str]: List of IDs of the added texts.
[0;31mFile:[0m      /opt/conda/envs/preventloss/lib/python3.9/site-packages/langchain/schema/vectorstore.py
[0;31mType:[0m      method

In [59]:
db.add_texts?

[0;31mSignature:[0m
[0mdb[0m[0;34m.[0m[0madd_texts[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mtexts[0m[0;34m:[0m [0;34m'Iterable[str]'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmetadatas[0m[0;34m:[0m [0;34m'Optional[List[dict]]'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mids[0m[0;34m:[0m [0;34m'Optional[List[str]]'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mkwargs[0m[0;34m:[0m [0;34m'Any'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;34m'List[str]'[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Run more texts through the embeddings and add to the vectorstore.

Args:
    texts: Iterable of strings to add to the vectorstore.
    metadatas: Optional list of metadatas associated with the texts.
    ids: Optional list of unique IDs.

Returns:
    List of ids from adding the texts into the vectorstore.
[0;31mFile:[0m      /opt/conda/envs/preventloss/lib/pyt