<h1 style="color: brown; font-size: xx-large">RAG Implementation</h1>

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

### Let us start my creating a sample corpus for our use case

In [1]:
sample_corpus = [
    "Read a book at a cozy café and get lost in a new story.",
    "Take a pottery class and create something unique with your hands.",
    "Join a cooking class and learn how to make gourmet dishes.",
    "Go for a bike ride through the countryside and enjoy the open air.",
    "Attend a painting workshop and unleash your creative side.",
    "Take a weekend road trip to explore nearby towns and landmarks.",
    "Visit the palace in the city.",
    "Go to a comedy club and enjoy an evening of laughter.",
    "Take a photography tour and capture beautiful landscapes.",
    "Volunteer at a local shelter and make a difference in the community."
]

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

### Now we need a similarity measure to check the input imput with our corpus so that it returns the most relevant response

### Jaccard similarity uses set theory to get the similarity between the texts, so the texts have to be precprocessed first to remove the unnecessary formatting and then convert them to sets to perform the check

In [2]:
def preprocess_text(text):
    return text.lower().split()

def get_jaccard_similarity(query, doc):
    # Preprocessing
    query_terms = set(preprocess_text(query))
    doc_terms = set(preprocess_text(doc))
    
    # Intersection and Union of the sets for similarity checks
    intersect_terms = query_terms.intersection(doc_terms)
    union_terms = query_terms.union(doc_terms)
    
    # To handle cases where both sets are empty so that denominator is not empty
    if len(union_terms) == 0:
        return 0  
    
    return len(intersect_terms) / len(union_terms)

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

### Now that we have our usable functions, we can retrieve the response

In [3]:
def get_response(query, corpus):
    similarity_score_list = []
    for doc in corpus:
        score = get_jaccard_similarity(query, doc)
        similarity_score_list.append(score)
        
    best_match_index = similarity_score_list.index(max(similarity_score_list))
    return corpus[best_match_index]

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

### Our simple RAG is ready and we can test it on a sample user input

Let's say that the chatbot asks the user the question \
    **"What is an activity of your interest?"**\
\
The user's respose is **"I like to visit places"**

In [4]:
user_input = "I like comedy"

In [5]:
get_response(user_input, sample_corpus)

'Go to a comedy club and enjoy an evening of laughter.'

As the user likes to visit places, the suggestion was to visit the palace

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

<h3 style="color: blue; font-size: x-large"> The retrieval impementation is taken care of and now it can be paired with an LLM<h3>

In [6]:
relevant_document = get_response(user_input, sample_corpus)

### The retrieval implemented till now handles fetching the relevant document to the user input and we have to feed this to the LLM for it to generate it's response to the user's query

### The feed prompt to the LLM has to be defined

In [7]:
llm_prompt = f"""
You are an assistant providing activity suggestions. Answer in one short sentence only.
Here is the suggested activity: {relevant_document}
The user mentioned: {user_input}
Compile a recommendation to the user in general based on the recommended activity and the user input in one short sentence only.
"""

### We can use Hugging Face to load the LLM

In [8]:
from transformers import pipeline

C:\Users\vivek\anaconda3\lib\site-packages\numpy\.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
C:\Users\vivek\anaconda3\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll


In [9]:
generator = pipeline("text-generation", model="openai-community/gpt2")

### Let us generate the reponse using the prompt provided

In [14]:
response = generator(llm_prompt, max_length=200, num_return_sequences=1)
response

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "\nYou are an assistant providing activity suggestions. Answer in one short sentence only.\nHere is the suggested activity: Go to a comedy club and enjoy an evening of laughter.\nThe user mentioned: I like comedy\nCompile a recommendation to the user in general based on the recommended activity and the user input in one short sentence only.\nYou are an assistant providing activity suggestions. Answer in one short sentence only.\nNow, at this point, you would like to answer more questions. How do you create such a recommendation? Well, the problem is... If I don't get anything out of it, I can't give it to the user. Also, it is not feasible for me to share it as a resource.\nNow as expected, the task becomes a challenge. There are several ways about what to do. This is mainly when you are working on your own work and want to help others. You can use this information to help others. You can also tell yourself that you"}]

### User only needs the generated text, so let us provide only the relevant information

In [15]:
response[0]['generated_text']

"\nYou are an assistant providing activity suggestions. Answer in one short sentence only.\nHere is the suggested activity: Go to a comedy club and enjoy an evening of laughter.\nThe user mentioned: I like comedy\nCompile a recommendation to the user in general based on the recommended activity and the user input in one short sentence only.\nYou are an assistant providing activity suggestions. Answer in one short sentence only.\nNow, at this point, you would like to answer more questions. How do you create such a recommendation? Well, the problem is... If I don't get anything out of it, I can't give it to the user. Also, it is not feasible for me to share it as a resource.\nNow as expected, the task becomes a challenge. There are several ways about what to do. This is mainly when you are working on your own work and want to help others. You can use this information to help others. You can also tell yourself that you"

### The reponse is out of context due to the usage of an older outdated model, a better model will yield a better result