# RAG with Access Control

**(with OpenAI LLMs)**

## Objectives

At the end of the experiment, you will be able to:

1. Load the Documents
2. Splitting the documents into chunks
3. Embedding the chunks and storing them in vector db
4. Retrieving the relevant chunks to the query
 * Addressing Diversity
 * Addressing Specificity
5. Connecting with LLM to get a final grounded answer

## Introduction

> **RAG diagram:**
>
> <img src='https://drive.google.com/uc?id=1sCVvpsmtZEU1WSK1FFGMGHbEjrgtCNLi'>

> **Vector Store and Retrieval:**
>
> <img src='https://drive.google.com/uc?id=1_zX5gtSNrV8Qdx7Nz4_gMR8dCwvxCDS7' width=750px>

> **Embedding Model:**
>
> <img src='https://drive.google.com/uc?id=1HnvjGJ4HmpS-0wndpH-Q8cKMwIwWkTUe'>

> **Retrieval in Action:**
>
> <img src='https://drive.google.com/uc?id=1ry2TWFsewwqYP3Lw9muuPmbyuQqXwnYV' width=800px>

> **Example workflow with embedding model:**
>
><br>
>
> <img src='https://drive.google.com/uc?id=1zTuMMX54L2HrnmCYktTxVfMVrkIz8w15' width=600px>

### Install Dependencies

In [1]:
%%capture
!pip -q install openai
!pip -q install langchain-openai
!pip -q install langchain-core
!pip -q install langchain-community
!pip -q install sentence-transformers
!pip -q install langchain-huggingface
!pip -q install langchain-chroma
!pip -q install chromadb
!pip -q install pypdf

### Import Required Packages

In [2]:
import os
import openai
import numpy as np
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import ChatOpenAI
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

#### **Provide your OpenAI API key**

In [3]:
# Read OpenAI key from Colab Secrets

from google.colab import userdata

api_key = userdata.get('OPENAI_KEY')           # <-- change this as per your secret's name
os.environ['OPENAI_API_KEY'] = api_key
openai.api_key = os.getenv('OPENAI_API_KEY')

### Load LLM

In [4]:
# Load Model

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [None]:
# General query
response = llm.invoke("How to learn programming? give 5 points")
print(response.content)

Learning programming can be an exciting and rewarding journey. Here are five key points to help you get started:

1. **Choose a Programming Language**: Start with a beginner-friendly language such as Python, JavaScript, or Ruby. Python is often recommended for its readability and versatility, making it suitable for various applications, from web development to data science.

2. **Utilize Online Resources**: Take advantage of online platforms like Codecademy, freeCodeCamp, Coursera, or edX. These platforms offer structured courses, tutorials, and exercises that can help you learn at your own pace.

3. **Practice Regularly**: Consistent practice is crucial for mastering programming. Work on small projects, solve coding challenges on platforms like LeetCode or HackerRank, and contribute to open-source projects to apply what you’ve learned.

4. **Join a Community**: Engage with other learners and experienced programmers through forums like Stack Overflow, Reddit, or local coding meetups. P

### **Loading the documents**

[PDF Loader](https://python.langchain.com/docs/how_to/document_loader_pdf/)

In [5]:
# Download PDFs
!gdown https://drive.google.com/uc?id=1Wy00e_FEBVwMx-jZBklNk9dzEW9a-LHc        # pca_d1.pdf  -  PCA (Principal Component Analysis)
!gdown https://drive.google.com/uc?id=1gMv6Ew7oGCPD0CA4D5iN_zAUBWY-SSJQ        # ens_d2.pdf  -  Ensemble Methods

Downloading...
From: https://drive.google.com/uc?id=1Wy00e_FEBVwMx-jZBklNk9dzEW9a-LHc
To: /content/pca_d1.pdf
100% 204k/204k [00:00<00:00, 56.7MB/s]
Downloading...
From: https://drive.google.com/uc?id=1gMv6Ew7oGCPD0CA4D5iN_zAUBWY-SSJQ
To: /content/ens_d2.pdf
100% 110k/110k [00:00<00:00, 63.1MB/s]


In [6]:
# UPLOAD the Docs first to this notebook, then run this cell

from langchain_community.document_loaders import PyPDFLoader

# Load PDF
loaders = [
    PyPDFLoader("/content/pca_d1.pdf"),
    PyPDFLoader("/content/ens_d2.pdf"),
]

docs = []
for loader in loaders:
    docs.extend(loader.load())


In [None]:
len(docs)        # 7 pages were there in total from above documents

5

In [None]:
docs

[Document(metadata={'source': '/content/pca_d1.pdf', 'page': 0}, page_content=' \n1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then dimensions are merely a basis of view, like where is the data located when \nit is observed from horizontal axis or vertical axis. \nAs the dimensions of data increases, the difficulty to visualize it and perform computations on \nit also increases. So, how to reduce the dimensions of a data:- \n• Remove the redundant dimensions \n• Only keep the most important dimensions  \nLet us first try to understand some terms:- \nVariance : It is a measure of the variability or it simply measures how spread the data set is.  \nMathematically, i

In [None]:
print(docs[0].page_content)

 
1 
 
 
N 
 
1 Principal Component Analysis 
In real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  
data and find various patterns in it or use it to train some machine learning models.  One way to  
think about dimensions is that suppose you have an data point x , if we consider this data point as 
a physical object then dimensions are merely a basis of view, like where is the data located when 
it is observed from horizontal axis or vertical axis. 
As the dimensions of data increases, the difficulty to visualize it and perform computations on 
it also increases. So, how to reduce the dimensions of a data:- 
• Remove the redundant dimensions 
• Only keep the most important dimensions  
Let us first try to understand some terms:- 
Variance : It is a measure of the variability or it simply measures how spread the data set is.  
Mathematically, it is the average squared deviation from the mean score. We use the following 
formula to compute 

### **Splitting of document**

[Recursively split by character](https://python.langchain.com/docs/how_to/recursive_text_splitter/)

[Split by character](https://python.langchain.com/docs/how_to/character_text_splitter/)

In [7]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [8]:
# Split
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50
)

In [9]:
splits = text_splitter.split_documents(docs)

print(len(splits))
print(len(splits[0].page_content) )
splits[0].page_content

19
443


'1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then dimensions are merely a basis of view, like where is the data located when'

In [None]:
splits[0]

Document(metadata={'source': '/content/pca_d1.pdf', 'page': 0}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then dimensions are merely a basis of view, like where is the data located when')

### **Embeddings**

Let's take our splits and embed them.

In [10]:
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model='text-embedding-3-small')

In [None]:
embedding

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x790d91275210>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x790d912778b0>, model='text-embedding-3-small', dimensions=None, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

### **Vectorstores**

In [11]:
from langchain_chroma import Chroma       # Light-weight and in memory

In [12]:
persist_directory = 'docs/chroma/'
!rm -rf ./docs/chroma  # remove old database files if any

In [13]:
vectordb = Chroma.from_documents(
    documents=splits,                    # splits we created earlier
    embedding=embedding,
    persist_directory=persist_directory, # save the directory
)

In [14]:
print(vectordb._collection.count()) # same as number of splits

19


### **Similarity Search in Vector store**

Algorithms for retrieving relevant chunks In Vector databases,

In vector databases, algorithms for retrieving relevant chunks to a query are often based on **similarity search techniques**, primarily using nearest neighbor search.

Here are some common approaches:

>**Approximate Nearest Neighbor (ANN) Search:** Vector databases frequently use ANN algorithms to improve efficiency when searching for vectors that
are close to the query vector.
>
>Popular **ANN** algorithms include:
>
>1. HNSW (Hierarchical Navigable Small World Graph): This is a graph-based approach that finds approximate nearest neighbors using a multi-
layered graph structure.
>
>2. Faiss: An open-source library developed by Facebook, which uses various algorithms for fast similarity search, such as Product Quantization and
Inverted File System (IVF).
>
>3. Annoy (Approximate Nearest Neighbors Oh Yeah): Developed by Spotify, it uses a forest of random projection trees for approximate nearest
neighbor search.


In [None]:
question = "How does ensemble method works?"

In [15]:
docs = vectordb.similarity_search(question, k=6)     # k --> No. of Document object to return

NameError: name 'question' is not defined

In [None]:
print(len(docs))

for i in range(len(docs)):
    print(docs[i].page_content)
    print('='*140)

6
Why use Ensemble Methods? 
Ensemble Methods are used in order to: 
• decrease variance (bagging) 
• decrease bias (boosting) 
• improve predictions (stacking) 
 
Bagging 
Bagging actually refers to Bootstrap Aggregators. 
Bagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - 
ping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis
considered. The product is bought by the user when the combined ratings of the group is positive. 
The user gets a fairer idea about the product when all the ratings are combined. 
Here, the combination of ratings is done so that the decision making process of the user is made  
easy. 
Ensemble Methods refer to combining many different machine learning models in order to get a  
more powerful prediction. 
Thus, ensemble methods increase the accuracy of the predictions.
1  
 
Ensemble Methods 
Let us consider a real world situation which uses Ensemble Methods, which is, 

### **Edge cases where failure may happen**

1. Lack of Diversity : Semantic search fetches all similar documents, but does not enforce diversity.

    - Notice that we're getting duplicate chunks (because of the duplicate `ens_d2.pdf` in the index). `docs[0]` and `docs[1]` are indentical.

  **Addressing Diversity - MMR (Maximum Marginal Relevance)**

Maximum Marginal Relevance (MMR) is a method used to retrieve relevant items to a query while avoiding redundancy. It does this by ensuring a balance between relevancy and diversity in the items retrieved.

<img src='https://miro.medium.com/v2/resize:fit:828/format:webp/1*U-9mPt5tBfPBPrwC4_oD1w.png'>

In [16]:
question = 'How ensemble method works?'
docs = vectordb.similarity_search(question, k=3)     # Without MMR

print(len(docs))

for i in range(len(docs)):
    print(docs[i].page_content)
    print('='*140)

3
Why use Ensemble Methods? 
Ensemble Methods are used in order to: 
• decrease variance (bagging) 
• decrease bias (boosting) 
• improve predictions (stacking) 
 
Bagging 
Bagging actually refers to Bootstrap Aggregators. 
Bagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - 
ping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis
considered. The product is bought by the user when the combined ratings of the group is positive. 
The user gets a fairer idea about the product when all the ratings are combined. 
Here, the combination of ratings is done so that the decision making process of the user is made  
easy. 
Ensemble Methods refer to combining many different machine learning models in order to get a  
more powerful prediction. 
Thus, ensemble methods increase the accuracy of the predictions.
1  
 
Ensemble Methods 
Let us consider a real world situation which uses Ensemble Methods, which is, 

**Example 1. Addressing Diversity - MMR-Maximum Marginal Relevance**

In [17]:
docs_with_mmr = vectordb.max_marginal_relevance_search(question, k=3, fetch_k=6)   # With MMR

print(len(docs_with_mmr))

for i in range(len(docs_with_mmr)):
    print(docs_with_mmr[i].page_content)
    print('='*140)

3
Why use Ensemble Methods? 
Ensemble Methods are used in order to: 
• decrease variance (bagging) 
• decrease bias (boosting) 
• improve predictions (stacking) 
 
Bagging 
Bagging actually refers to Bootstrap Aggregators. 
Bagging tests multiple models on the data by sampling and replacing data i.e it utilizes bootstrap - 
ping. In turn, this reduces the noise and variance by utilizing multiple samples. Each hypothesis
1  
 
Ensemble Methods 
Let us consider a real world situation which uses Ensemble Methods, which is, when a user wants 
to buy a new product. Many users who have already purchased that product will have given either  
positive or negative ratings. If in the group, many users have given positive ratings, then the 
combined rating will be positive. Instead of a single rating, the ratings of the group of users is
2 
 
 
 
So, what does Principal Component Analysis (PCA) do? 
PCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  


2. Lack of specificity:  The question may be from a particular doc but answer may contain information from other doc.

  **Addressing Specificity: Working with metadata - Manually**

  **Working with metadata using self-query retriever - Automatically**

**Example 2. Addressing Specificity: Working with metadata - Manually**

In [18]:
# Without metadata information
question = "What is variance?"

docs = vectordb.similarity_search(question, k=5)

for doc in docs:
    print(doc.metadata)    # metadata contains information about from which doc the answer has been fetched

{'page': 0, 'source': '/content/pca_d1.pdf'}
{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 0, 'source': '/content/pca_d1.pdf'}
{'page': 1, 'source': '/content/pca_d1.pdf'}
{'page': 1, 'source': '/content/pca_d1.pdf'}


We can filter the results based on metadata.

In [19]:
# With metadata information
question = "what is the role of variance in pca?"
docs = vectordb.similarity_search(
    question,
    k=5,
    filter={"source":'/content/ens_d2.pdf'}     # manually passing metadata, using metadata filter.
)

for doc in docs:
    print(doc.metadata)

{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 0, 'source': '/content/ens_d2.pdf'}
{'page': 1, 'source': '/content/ens_d2.pdf'}
{'page': 1, 'source': '/content/ens_d2.pdf'}
{'page': 0, 'source': '/content/ens_d2.pdf'}


In [20]:
# With metadata information + MMR

docs_with_mmr = vectordb.max_marginal_relevance_search(question,
                                                       k=2,
                                                       fetch_k=5,
                                                       filter={"source":'/content/ens_d2.pdf'}     # manually passing metadata, using metadata filter.
                                                       )

In [21]:
for i in range(len(docs_with_mmr)):
    print(docs_with_mmr[i].page_content)
    print('='*140)

models. 
 
Variance 
Variance quantifies how the predictions made on same observation are different from each other. A  
high variance model will over -fit on your training population and perform badly on any observation  
beyond training. Thus, we aim at low variance.
subset of features is selected, further randomizing the tree. 
As a result, the bias of the forest increases slightly, but due to the averaging of less correlated  
trees, its variance decreases, resulting in an overall better model.


[**Addressing Specificity -Automatically: Working with metadata using self-query retriever**](https://python.langchain.com/docs/how_to/self_query/)

## **Retrieval**

**[Vectorstore as a retriever](https://python.langchain.com/docs/how_to/vectorstore_retriever/)**

**Better Approach**

In [22]:
# Without MMR
question = "What is principal component analysis?"
retriever = vectordb.as_retriever(search_kwargs={"k": 3})
docs = retriever.invoke(question)
docs

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

In [23]:
# With MMR
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 2, "fetch_k":5})
docs = retriever.invoke(question)
docs

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

In [24]:
# With MMR
retriever = vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 2, "fetch_k":5, "filter": {"source": '/content/pca_d1.pdf'}})
docs = retriever.invoke(question)
docs

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

In [25]:
# With MMR
filter_criteria = {"source": {"$in": ['/content/pca_d1.pdf', '/content/ens_d2.pdf']}}

retriever = vectordb.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 5,
        "fetch_k": 7,
        "filter": filter_criteria
    }
)

# Invoke the retriever with a question
docs = retriever.invoke(question)
docs

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

## **Augmentation**

In [26]:
from langchain_core.prompts import PromptTemplate                                    # To format prompts
from langchain_core.output_parsers import StrOutputParser                            # to transform the output of an LLM into a more usable format
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough          # Required by LCEL (LangChain Expression Language)

In [27]:
# Build prompt
template = """Please use the following context to respond to the question provided at the end.

1. If you are unsure about the answer, simply state that you don’t know. Avoid making assumptions or guesses.
2. Always conclude your response with: "Thanks for asking!"
3. If the context includes "Access Restricted," respond with: "Your Access is Restricted."

Context:
{context}

Question: {question}

Response:"""



QA_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

## **Creating final RAG Chain**

> <img src='https://www.pinecone.io/_next/image/?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2Fvr8gru94%2Fproduction%2F63f8a8482c9ec06a8d7d1041514f87c06dd108a9-3442x942.png&w=3840&q=75' width=1200px>

[[Image source](https://www.pinecone.io/learn/series/langchain/langchain-expression-language/)]

Above figure describes the LCEL flow using `RunnableParallel` and `RunnablePassthrough`.

A Runnable is a **unit of execution** in the LangChain framework. It represents a specific task or operation that can be performed.

Examples of Runnables include data transformations, computations, or any other operation that can be **expressed** in the LCEL(LangChain expression language).

[Runnable Lambdas](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.RunnableLambda.html) is a LangChain abstraction that allows us to turn Python functions into **pipe-compatible functions**, similar to the Runnable class.

[RunnablePassthrough](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html) on its own allows you to pass inputs unchanged. This typically is **used in conjuction with [RunnableParallel](https://python.langchain.com/v0.1/docs/expression_language/interface/#parallelism)** to pass data through to a new key in the map.

The **RunnableParallel** object allows us to define multiple values and operations, and run them all in parallel.

The **RunnablePassthrough** object is used as a “passthrough” that takes any input to the current component ('retrieval' in above figure) and allows us to provide it in the component output via the “question” key or any other custom key.

In [28]:
def create_retriever_with_filter(refined_question, user_name):
    """ `user1` has access to `pca_d1.pdf`, and
        `user2` has access to `ens_d2.pdf` """

    if user_name == "user1":
        files = ['/content/pca_d1.pdf']
    elif user_name == "user2":
        files = ['/content/ens_d2.pdf']
    else:
        return "Access Restricted"

    # Define your filter criteria dynamically
    filter_criteria = {"source": {"$in": files}}

    # Initialize the retriever with filtering
    retriever = vectordb.as_retriever(
        search_type="mmr",
        search_kwargs={
            "k": 5,
            "fetch_k": 7,
            "filter": filter_criteria  # Include the dynamic filter argument
        }
    )

    return retriever.invoke(refined_question)


In [33]:
retrieved_context = create_retriever_with_filter("What is principal component analysis?", "user1")
retrieved_context

[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical object then di

In [30]:
# Example usage
retrieved_context_1 = create_retriever_with_filter("What is principal component analysis?", "user2")
retrieved_context_1

[Document(metadata={'page': 0, 'source': '/content/ens_d2.pdf'}, page_content='considered. The product is bought by the user when the combined ratings of the group is positive. \nThe user gets a fairer idea about the product when all the ratings are combined. \nHere, the combination of ratings is done so that the decision making process of the user is made  \neasy. \nEnsemble Methods refer to combining many different machine learning models in order to get a  \nmore powerful prediction. \nThus, ensemble methods increase the accuracy of the predictions.'),
 Document(metadata={'page': 1, 'source': '/content/ens_d2.pdf'}, page_content='subset of features is selected, further randomizing the tree. \nAs a result, the bias of the forest increases slightly, but due to the averaging of less correlated  \ntrees, its variance decreases, resulting in an overall better model.'),
 Document(metadata={'page': 0, 'source': '/content/ens_d2.pdf'}, page_content='has the same weight as all the others. No

In [31]:
# Example usage
retrieved_context_2 = create_retriever_with_filter("What is principal component analysis?", "user3")
retrieved_context_2

'Access Restricted'

#### Refine Question

In [29]:
# prompt: Now, I need to add one more functionality in the above block of codes for retrieval and response with access restriction. When I pass on the question, the question should be refined before going to the retrieval. For example, if user ask that "What is pca", then it should refine it to "What is Principal Component Analysis?". The code can use LLM for refining the question. Give the code for it.

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

def refine_question(question):
    refine_prompt = PromptTemplate(
        input_variables=["question"],
        template="Refine the following question to be more specific and technically accurate:\n\nQuestion: {question}\n\nRefined Question:"
    )
    refine_chain = LLMChain(llm=llm, prompt=refine_prompt)
    refined_question = refine_chain.run(question)
    return refined_question


In [34]:
import warnings
# Suppress specific LangChain deprecation warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Example usage with question refinement
user_question = "What is pca"  # This line and subsequent lines should be indented

refined_question = refine_question(user_question)

print(f"Original question: {user_question}")
print(f"Refined question: {refined_question}\n")

retrieved_context = create_retriever_with_filter(refined_question, "user1")
retrieved_context

Original question: What is pca
Refined question: What is Principal Component Analysis (PCA), and how is it used for dimensionality reduction in data analysis?



[Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
 Document(metadata={'page': 2, 'source': '/content/pca_d1.pdf'}, page_content='This defines the goal of PCA:- \n1. Find linearly independent dimensions which can losslessly represent the data points. \n2. Those newly found dimensions should allow us to predict/reconstruct the original dimensions.'),
 Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis t

In [35]:
from langchain_core.runnables import RunnableLambda

retrieval = RunnableParallel(
    {
        "context": RunnableLambda(lambda x: create_retriever_with_filter(x["question"], x["username"])),
        "question": RunnableLambda(lambda x: x["question"])
        }
    )

In [36]:
retrieval.invoke({"question": "What is Principal component analysis ?",
                  "username": "user1"})

{'context': [Document(metadata={'page': 1, 'source': '/content/pca_d1.pdf'}, page_content='2 \n \n \n \nSo, what does Principal Component Analysis (PCA) do? \nPCA finds a new set of dimensions (or a set of basis of views) such that all the dimensions are  \northogonal (and hence linearly independent) and ranked according to the variance of data along  \nthem. It means more important principle axis occurs first. (more important = more variance/more  \nspread out data) \n \nHow does PCA work? \n• Calculate the covariance matrix X of data points.'),
  Document(metadata={'page': 0, 'source': '/content/pca_d1.pdf'}, page_content='1 \n \n \nN \n \n1 Principal Component Analysis \nIn real world data analysis tasks we analyze complex data i.e. multi dimensional data. We plot the  \ndata and find various patterns in it or use it to train some machine learning models.  One way to  \nthink about dimensions is that suppose you have an data point x , if we consider this data point as \na physical o

In [37]:
# RAG Chain

rag_chain = (retrieval                     # Retrieval
             | QA_PROMPT                   # Augmentation
             | llm                         # Generation
             | StrOutputParser()
             )

The below output is without question refinement.

In [38]:
# Example usage of the complete chain without question refinement
user_input = {"question": user_question, "username": "user1"}

response = rag_chain.invoke(user_input)
response

'Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of data while preserving as much variance as possible. It finds a new set of orthogonal dimensions (or basis views) that are linearly independent and ranked according to the variance of the data along them. The main goals of PCA are to identify linearly independent dimensions that can represent the data points without loss and to allow for the prediction or reconstruction of the original dimensions. The process involves calculating the covariance matrix of the data, determining the eigenvectors and eigenvalues, sorting the eigenvectors by their eigenvalues, and transforming the original data into a reduced set of dimensions.\n\nThanks for asking!'

This output is with the question refinement.

In [39]:
# Example usage of the complete chain with question refinement
user_input = {"question": refined_question, "username": "user1"}

response = rag_chain.invoke(user_input)
response

'Principal Component Analysis (PCA) is a statistical technique used to analyze complex, multi-dimensional data by transforming it into a new set of dimensions that are orthogonal (linearly independent) and ranked according to the variance of the data along these dimensions. The primary goals of PCA are to find linearly independent dimensions that can represent the data points without loss and to enable the prediction or reconstruction of the original dimensions.\n\nThe process of PCA involves several steps:\n1. Calculating the covariance matrix of the data points.\n2. Computing the eigenvectors and corresponding eigenvalues of the covariance matrix.\n3. Sorting the eigenvectors based on their eigenvalues in decreasing order.\n4. Selecting the top k eigenvectors to form the new k dimensions.\n5. Transforming the original n-dimensional data points into these k dimensions.\n\nBy focusing on the dimensions with the highest variance, PCA effectively reduces the dimensionality of the data wh

#### Examples for testing

In [40]:
response = rag_chain.invoke({"question": refined_question,
                             "username": "user2"})

response

'I don’t know. Thanks for asking!'

In [41]:
response = rag_chain.invoke({"question": "What is Variance?",
                             "username": "user1"})

response

'Variance is a measure of the variability of a data set; it quantifies how spread out the data points are from the mean score. Mathematically, it is calculated using the formula:\n\n\\[ \\text{var}(x) = \\frac{\\Sigma(x_i - \\bar{x})^2}{N} \\]\n\nwhere \\( x_i \\) represents the individual data points, \\( \\bar{x} \\) is the mean of the data, and \\( N \\) is the number of data points. \n\nThanks for asking!'

In [42]:
response = rag_chain.invoke({"question": "What is Variance ?",
                             "username": "user2"})

response

'Variance quantifies how the predictions made on the same observation are different from each other. A high variance model will overfit on your training population and perform badly on any observation beyond training. Thus, we aim at low variance. \n\nThanks for asking!'

In [43]:
response = rag_chain.invoke({"question": "What is Variance ?",
                             "username": "user3"})

response

'Your Access is Restricted. Thanks for asking!'

In [44]:
response = rag_chain.invoke({"question": "What is Principal component analysis?",
                             "username": "user2"})

response

'I don’t know. Thanks for asking!'

In [45]:
response = rag_chain.invoke({"question": "How ensemble method works?",
                             "username": "user2"})

print(response)

Ensemble methods work by combining multiple models to improve predictions. They can decrease variance (through techniques like bagging), decrease bias (through methods like boosting), and enhance overall prediction accuracy (through stacking). 

For example, in bagging, multiple models are trained on different samples of the data, which helps to reduce noise and variance. In boosting, models are trained sequentially, with each new model focusing on the errors made by the previous ones, thereby reducing bias. Stacking involves training multiple models and then combining their predictions to improve the final output.

Thanks for asking!


In [46]:
# For queries that is not in documents
response = rag_chain.invoke({"question": "Who is the CEO of OpenAI ",
                             "username": "user1"})

print(response)

I don’t know. Thanks for asking!


In [47]:
# For queries that is not in documents
response = rag_chain.invoke({"question": "Who is the CEO of OpenAI ",
                             "username": "user3"})

print(response)

Your Access is Restricted. Thanks for asking!


### Gradio Interface for Retrieval Augmented Generation with Access Control


In [48]:
%%capture
!pip install gradio

In [50]:
import gradio as gr

def rag_interface(question, username):
    refined_question = refine_question(question)
    user_input = {"question": refined_question, "username": username}
    response = rag_chain.invoke(user_input)
    return refined_question, response

iface = gr.Interface(
    fn=rag_interface,
    inputs=[
        gr.Textbox(label="Enter your question"),
        gr.Textbox(label="Enter your username")
    ],
    outputs=[
        gr.Textbox(label="Refined Question"),
        gr.Textbox(label="Response")
    ],
    title="Retrieval Augmented Generation with Access Control",
    description="Ask a question, provide your username, and get a refined question with the response.",
    allow_flagging="never"
)

iface.launch()



Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://a49fbd64dcb4669c4d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




[**Details of Chroma through LangChain**](https://python.langchain.com/docs/integrations/vectorstores/chroma/)

## Reusing Vector DB

### **Download the vector DB**

In [None]:
# Zip the entire folder
!zip -r /content/docs.zip /content/docs

  adding: content/docs/ (stored 0%)
  adding: content/docs/chroma/ (stored 0%)
  adding: content/docs/chroma/7b1ffc7b-8801-4a1d-b316-29daad0146a6/ (stored 0%)
  adding: content/docs/chroma/7b1ffc7b-8801-4a1d-b316-29daad0146a6/length.bin (deflated 98%)
  adding: content/docs/chroma/7b1ffc7b-8801-4a1d-b316-29daad0146a6/link_lists.bin (stored 0%)
  adding: content/docs/chroma/7b1ffc7b-8801-4a1d-b316-29daad0146a6/data_level0.bin (deflated 100%)
  adding: content/docs/chroma/7b1ffc7b-8801-4a1d-b316-29daad0146a6/header.bin (deflated 61%)
  adding: content/docs/chroma/chroma.sqlite3 (deflated 61%)


In [None]:
from google.colab import files
files.download("/content/docs.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### **Upload the vector db from previous step and unzip**

In [None]:
!unzip /content/docs.zip  -d /

Archive:  /content/docs.zip
replace /content/docs/chroma/7b1ffc7b-8801-4a1d-b316-29daad0146a6/length.bin? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model='text-embedding-3-small')

vectordb = Chroma(persist_directory = 'docs/chroma/',
                  embedding_function = embedding
                  )