## **Vector Database**

#### **PyPDFDirectoryLoader**:

* **Purpose**: Designed to load all PDF files from a specified directory.
* **Use Case**: When you have a directory containing multiple PDFs and want to process all of them in a single operation.
* **Functionality**:

  * It iterates through the directory.
  * Loads all files with a `.pdf` extension.
  * Returns their content as documents that can be processed in LangChain.

In [5]:
from langchain_community.document_loaders import PyPDFDirectoryLoader

directory_loader=PyPDFDirectoryLoader(r"data")
len(directory_loader.load())

17

In [6]:
directory_loader.load()

[Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On 

#### **PyPDFLoader**:

* **Purpose**: Designed to load content from a single PDF file.
* **Use Case**: When you want to process a specific PDF file rather than all PDFs in a directory.
* **Functionality**:

  * Takes a path to a single PDF file.
  * Reads its content and returns it as a document.


In [12]:
from langchain_community.document_loaders import PyPDFLoader

loader=PyPDFLoader(r"data\NIPS-2017-attention-is-all-you-need-Paper.pdf")
len(loader.load())

11

In [13]:
loader.load()

[Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On 

In [14]:
documents=loader.load()
print(documents[0].page_content)

Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser ∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to
be superior in quality while being more parallelizable and requiring signiﬁcantly
less time to train. Our model 

In [15]:
documents[0].metadata

{'producer': 'PyPDF2',
 'creator': 'PyPDF',
 'creationdate': '',
 'subject': 'Neural Information Processing Systems http://nips.cc/',
 'publisher': 'Curran Associates, Inc.',
 'language': 'en-US',
 'created': '2017',
 'eventtype': 'Poster',
 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On English-to-

In [16]:
for doc in documents:
    print(doc.page_content)
    print("\n")
    print("##################################################")
    

Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser ∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to
be superior in quality while being more parallelizable and requiring signiﬁcantly
less time to train. Our model 

In [17]:
from dotenv import load_dotenv
load_dotenv()

True

In [18]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
google_embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

In [19]:
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

#### 1. **`dim = len(google_embeddings.embed_query(page_one))`**

* **What does this do?**

  * The `google_embeddings.embed_query(page_one)` function converts the text in `page_one` into a vector (a numerical representation of the text).
  * The `len()` function gets the size of that vector (how many numbers are in it).
* **Why is this important?**
  The size (or "dimension") of the vector is needed to create a FAISS index. FAISS requires all vectors to have the same dimension.

---

#### 2. **`faiss_index = faiss.IndexFlatIP(dim)`**

* **What does this do?**

  * This creates a FAISS index for similarity search.
  * The `IndexFlatIP` type uses **Inner Product (IP)** to calculate similarity. Inner Product is a mathematical way to measure how similar two vectors are.
  * The `dim` parameter tells FAISS how big each vector is (e.g., 4 numbers, 128 numbers, etc.).

* **Why is this important?**
  The FAISS index will store all the vectors so you can efficiently search for similar ones later.

In [25]:
page_one = documents[1].page_content
dim=len(google_embeddings.embed_query(page_one))
print(dim)

faiss_index=faiss.IndexFlatIP(dim)
print(faiss_index)

768
<faiss.swigfaiss_avx2.IndexFlatIP; proxy of <Swig Object of type 'faiss::IndexFlatIP *' at 0x0000017F7305EAF0> >


#### 1. **FAISS**

* **What is FAISS?**
  FAISS (Facebook AI Similarity Search) is a library designed to efficiently find similar vectors in a large dataset. It’s commonly used for tasks like searching for documents, images, or any data represented as vectors.

* **Why use FAISS?**
  When you have many pieces of data converted into numerical representations (vectors), FAISS helps find the most relevant or similar ones quickly.

---

#### 2. **Key Components**

The `FAISS` object is being created with several components that work together:

##### a. **`embedding_function=google_embeddings`**

* **What is an embedding function?**
  An embedding function is a tool that converts text, images, or other data into vectors (numerical arrays). These vectors are created in such a way that similar data points (e.g., texts with similar meanings) have vectors that are close together.
* **What does `google_embeddings` do?**
  It’s likely a pre-trained embedding function from Google, used here to turn documents or queries into vectors. For example:

  ```plaintext
  "apple" -> [0.1, 0.2, 0.9, ...] (vector representation)
  ```

##### b. **`index=faiss_index`**

* **What is `faiss_index`?**
  This is the data structure from FAISS where the vectors are stored. It’s like a database but optimized for searching similar vectors efficiently.
* **Why use it?**
  The FAISS index enables fast similarity searches, which is especially useful for large datasets.

##### c. **`docstore=InMemoryDocstore()`**

* **What is `docstore`?**
  A document store is where the actual data (e.g., full documents or text) is stored.
* **Why use `InMemoryDocstore`?**
  `InMemoryDocstore` keeps the data in memory (RAM) for quick access. It’s simple and fast but not suitable for very large datasets because it depends on your machine’s memory capacity.

##### d. **`index_to_docstore_id={}`**

* **What is `index_to_docstore_id`?**
  This is a mapping between the FAISS index and the document IDs in the document store.
* **Why use it?**
  When FAISS finds a vector in the index, it needs to know which document in the `docstore` the vector represents. This mapping ensures you can retrieve the corresponding document.

---

#### 3. **Purpose of the Code**

This snippet creates a **FAISS-powered vector search system** with the following workflow:

1. **Embed:** Use `google_embeddings` to turn your documents into vectors.
2. **Store:** Save these vectors in `faiss_index` for quick similarity searching.
3. **Document Access:** Keep the original documents in `InMemoryDocstore` so that you can retrieve them after finding similar vectors.
4. **Mapping:** Use `index_to_docstore_id` to link vectors in the FAISS index with documents in the `docstore`.

---

#### 4. **How It Works in Practice**

Here’s an example of what this setup allows you to do:

1. You have a dataset of articles.
2. You convert each article into a vector using `google_embeddings`.
3. Store these vectors in `faiss_index` and the full articles in `InMemoryDocstore`.
4. When a user asks a question (query), it’s converted into a vector using `google_embeddings`.
5. FAISS finds the most similar vectors (articles) in `faiss_index`.
6. You use the mapping (`index_to_docstore_id`) to retrieve the actual articles from `InMemoryDocstore`.

---

#### 5. **Simplified Analogy**

Think of FAISS as a library system:

* **FAISS Index:** The catalog cards, organized for quick searches.
* **Document Store:** The shelves with actual books.
* **Embedding Function:** The tool that describes each book in numerical form for cataloging.
* **Mapping:** The link between catalog cards and books.

This setup allows you to quickly find and retrieve books (documents) based on their description (vector similarity).


In [26]:
vector_store = FAISS(
    embedding_function=google_embeddings,
    index=faiss_index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

In [27]:
vector_store.add_documents(documents)

['c4a25ce5-646b-44cd-af04-0c4b8edfac45',
 '3ac0bcaf-412e-48a7-97f8-92e58c762ca0',
 '85324418-da65-4c43-acde-9675c84319b3',
 '8ffbd719-9657-43f9-9c24-d8f7cb2c4594',
 'a64c15ef-6d41-4afb-b476-cd2b6d94bcbd',
 '43c74cfe-3a0f-4b8e-b631-360c6c2e26b2',
 'ec96e57f-1ec7-4f1d-8680-3b51952c6c6c',
 'de14fe13-f5f2-40b8-9151-7f2ed33857d1',
 '2785a100-8b83-4a4a-b19b-8524cc8886fc',
 '42ae0dba-101c-4992-a3bc-1670a26bc63e',
 'ab565e51-d728-4fdf-a6ce-dd18908146c2']

In [42]:
search=vector_store.similarity_search(
    query="what is the model architecture?",
    k=2
)

In [35]:
for i in search:
    print(i.page_content)
    print("-----------------------------------------------")

Figure 1: The Transformer - model architecture.
wise fully connected feed-forward network. We employ a residual connection [10] around each of
the two sub-layers, followed by layer normalization [ 1]. That is, the output of each sub-layer is
LayerNorm(x+ Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer
itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding
layers, produce outputs of dimension dmodel = 512.
Decoder: The decoder is also composed of a stack of N = 6identical layers. In addition to the two
sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head
attention over the output of the encoder stack. Similar to the encoder, we employ residual connections
around each of the sub-layers, followed by layer normalization. We also modify the self-attention
sub-layer in the decoder stack to prevent positions from attending to subsequent positions. This
masking, combined 

In [41]:
from langchain_groq import ChatGroq

# Step 2: Combine the search results into a single context
context = "\n".join([result.page_content for result in search])

# Step 3: Initialize the ChatGroq model
groq_model = ChatGroq(model="llama-3.3-70b-versatile")

# Step 4: Create a prompt with the context and query
query = "what is the model architecture?"
prompt = f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer:"

# Step 5: Use the Groq model to get a response
response = groq_model.invoke(prompt).content

print(response)


The model architecture is that of a Transformer, which consists of an encoder and a decoder. 

- The encoder is composed of a stack of N = 6 identical layers, each containing two sub-layers: a self-attention mechanism and a position-wise fully connected feed-forward network. 
- The decoder is also composed of a stack of N = 6 identical layers, each containing three sub-layers: a self-attention mechanism, an encoder-decoder attention mechanism, and a position-wise fully connected feed-forward network. 
- All sub-layers in the model produce outputs of dimension dmodel = 512. 
- The model uses residual connections around each sub-layer, followed by layer normalization. 
- The model also uses multi-head attention, with h = 8 parallel attention layers, or heads. 
- The dimension of each head is dk = dv = dmodel/h = 64. 
- The model uses positional encodings to inject information about the relative or absolute position of the tokens in the sequence. 
- The model uses learned embeddings to co