#### FAISS
FAISS is Facebook AI Similarity Search which is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning


In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("speech.txt")
document = loader.load()
text_splitters = CharacterTextSplitter(chunk_size=1000, chunk_overlap=30)

docs = text_splitters.split_documents(document)
docs

Created a chunk of size 1042, which is longer than the specified 1000


[Document(metadata={'source': 'speech.txt'}, page_content="Cristiano Ronaldo dos Santos Aveiro is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards, a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, most appearances (30), assists (8), goals in the European Championship (14), international goals (130) and international appearances (212). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored o

In [2]:
hf_embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = FAISS.from_documents(docs, hf_embeddings)
db

  from tqdm.autonotebook import tqdm, trange


<langchain_community.vectorstores.faiss.FAISS at 0x268383fd360>

In [10]:
## querying the db
query = "When did Cristiano Ronaldo won back to back Balon d'Or?"
result = db.similarity_search(query)
result[0].page_content

"Ronaldo began his senior career with Sporting CP, before signing with Manchester United in 2003, winning the FA Cup in his first season. He would also go on to win three consecutive Premier League titles, the Champions League and the FIFA Club World Cup; at age 23, he won his first Ballon d'Or. Ronaldo was the subject of the then-most expensive association football transfer when he signed for Real Madrid in 2009 in a transfer worth â‚¬94 million (Â£80 million). He became a key contributor and formed an attacking trio with Karim Benzema and Gareth Bale which was integral to the team winning four Champions Leagues from 2014 to 2018, including La DÃ©cima. During this period, he won back-to-back Ballons d'Or in 2013 and 2014, and again in 2016 and 2017, and was runner-up three times behind Lionel Messi, his perceived career rival. He also became the club's all-time top goalscorer and the all-time top scorer in the Champions League, and finished as the competition's top scorer for six cons

### As a Retriever

We can also convert the vectorstore into a Retriever class. This allows us to easily use it in other Langchain methods, which largely work with retriever

In [11]:
retriever = db.as_retriever()
retriever.invoke(query)

[Document(metadata={'source': 'speech.txt'}, page_content="Ronaldo began his senior career with Sporting CP, before signing with Manchester United in 2003, winning the FA Cup in his first season. He would also go on to win three consecutive Premier League titles, the Champions League and the FIFA Club World Cup; at age 23, he won his first Ballon d'Or. Ronaldo was the subject of the then-most expensive association football transfer when he signed for Real Madrid in 2009 in a transfer worth â‚¬94 million (Â£80 million). He became a key contributor and formed an attacking trio with Karim Benzema and Gareth Bale which was integral to the team winning four Champions Leagues from 2014 to 2018, including La DÃ©cima. During this period, he won back-to-back Ballons d'Or in 2013 and 2014, and again in 2016 and 2017, and was runner-up three times behind Lionel Messi, his perceived career rival. He also became the club's all-time top goalscorer and the all-time top scorer in the Champions League,

In [12]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(metadata={'source': 'speech.txt'}, page_content="Ronaldo began his senior career with Sporting CP, before signing with Manchester United in 2003, winning the FA Cup in his first season. He would also go on to win three consecutive Premier League titles, the Champions League and the FIFA Club World Cup; at age 23, he won his first Ballon d'Or. Ronaldo was the subject of the then-most expensive association football transfer when he signed for Real Madrid in 2009 in a transfer worth â‚¬94 million (Â£80 million). He became a key contributor and formed an attacking trio with Karim Benzema and Gareth Bale which was integral to the team winning four Champions Leagues from 2014 to 2018, including La DÃ©cima. During this period, he won back-to-back Ballons d'Or in 2013 and 2014, and again in 2016 and 2017, and was runner-up three times behind Lionel Messi, his perceived career rival. He also became the club's all-time top goalscorer and the all-time top scorer in the Champions League

#### Vectorstoredb with ChromaDB

In [13]:
# from langchain_community.vectorstores import Chroma
from langchain_chroma import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [14]:
text_spliiter_recursive = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
splits = text_spliiter_recursive.split_documents(document)
splits

[Document(metadata={'source': 'speech.txt'}, page_content="Cristiano Ronaldo dos Santos Aveiro is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards, a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues,"),
 Document(metadata={'source': 'speech.txt'}, page_content='seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, most appearances (30), assists (8), goals in the European Championship (14), international goals (130) and international appearances (212). He is one of the few p

In [15]:
chroma_db = Chroma.from_documents(documents=splits, embedding=hf_embeddings)
chroma_db

<langchain_chroma.vectorstores.Chroma at 0x26871ffbd60>

In [22]:
query_text = "How many Champions League titles have Cristiano Ronaldo won?"
query_result = chroma_db.similarity_search(query=query_text)
query_result[0]

Document(metadata={'source': 'speech.txt'}, page_content='seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, most appearances (30), assists (8), goals in the European Championship (14), international goals (130) and international appearances (212). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 890')

In [23]:
query_result_with_score = chroma_db.similarity_search_with_score(query=query_text)
query_result_with_score

[(Document(metadata={'source': 'speech.txt'}, page_content='seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, most appearances (30), assists (8), goals in the European Championship (14), international goals (130) and international appearances (212). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 890'),
  0.6790083646774292),
 (Document(metadata={'source': 'speech.txt'}, page_content="Cristiano Ronaldo dos Santos Aveiro is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards, a record three UEFA Men's Player of the Year Awards, and four Europea

In [25]:
# Retriever option
retriever_chroma = chroma_db.as_retriever()
retriever_chroma.invoke(query_text)

[Document(metadata={'source': 'speech.txt'}, page_content='seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, most appearances (30), assists (8), goals in the European Championship (14), international goals (130) and international appearances (212). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 890'),
 Document(metadata={'source': 'speech.txt'}, page_content="Cristiano Ronaldo dos Santos Aveiro is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards, a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most 