# **Retrival**

In LangChain, retrieval refers to the process of accessing and fetching relevant pieces of information from a collection of documents or data sources based on a query. This involves using techniques like similarity search and embedding-based methods to find and return the most pertinent documents or text segments that match the query's context and content.

In [None]:
%%capture
# update or install the necessary libraries
!pip install --upgrade langchain langchain_community langchain_aws pypdf tiktoken faiss-cpu
!pip install --upgrade python-dotenv

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

os.environ["AWS_ACCESS_KEY_ID"] = os.getenv("AWS_ACCESS_KEY_ID")
os.environ["AWS_SECRET_ACCESS_KEY"] = os.getenv("AWS_SECRET_ACCESS_KEY")
os.environ["AWS_DEFAULT_REGION"] = os.getenv("AWS_DEFAULT_REGION")

# **Vectorstore retrieval**

Vectorstore retrieval is a technique that involves storing and retrieving data using vector representations of text. It enables efficient similarity searches by converting text into high-dimensional vectors and then finding the closest vectors in the database. This method is particularly useful for applications requiring fast and accurate information retrieval based on semantic similarity.



In [None]:
from langchain.vectorstores import FAISS
persist_directory = 'docs/faiss/'

In [None]:
from langchain_aws import BedrockEmbeddings

embedding = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0"
)

In [None]:
texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]

In [None]:
smalldb = FAISS.from_texts(texts, embedding=embedding)

In [None]:
question = "Tell me about all-white mushrooms with large fruiting bodies"

# **Similarity Search**

Similarity search is a method used to find items in a database that are most similar to a given query item based on certain criteria. It involves comparing vector representations of items and retrieving those with the smallest distance or highest similarity score. This approach is commonly used in applications like document retrieval, image matching, and recommendation systems.



In [None]:
smalldb.similarity_search(question, k=2)

In [None]:
smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3)

# **Let's Do an Activity**

## **Objective**

Explore different retrieval techniques in LangChain to fetch relevant documents based on queries.

## **Scenario**

You are building a document retrieval system for a research library. This activity will help you understand and implement various retrieval methods to efficiently fetch documents related to specific topics or queries.

## **Steps**

* Load Documents
* Embedding and Vector Store
* Similarity Search
* Maximum Marginal Relevance (MMR)
* Metadata Filtering
* Alternative Retrieval Methods