# Rag App 

The aim is to build a rag based chat app on the 'hotel analysis report' and test it!.

## Installs

In [18]:
! pip install langchain_chroma langchain_ollama langchain_core langchain_groq load_dotenv langchain_community



## Imports

In [1]:

from langchain_ollama import OllamaEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.schema.document import Document
from langchain_groq import ChatGroq
from load_dotenv import load_dotenv
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


import os

load_dotenv()

True

### Paths 

In [2]:

CHROMA_PATH = 'chromadb'
DATA_PATH = 'data'

In [3]:
from langchain.vectorstores.chroma import Chroma
db = Chroma(
    persist_directory=CHROMA_PATH,
)

  db = Chroma(


## Data (Report) Preprocessing

In [4]:
import re
import string

def clean_text(text):

    # Convert text to lowercase
    text = text.lower()
    
    # Remove Extra Newlines
    text = re.sub(r'\n+', '\n', text)
    
    return text

# Load the text file
with open('data/final_hotel_bookings_report.txt', 'r', encoding='utf-8') as file:
    text = file.read()

# Clean the text
cleaned_text = clean_text(text)

# Save the cleaned text to a file
with open('data/cleaned/final_hotel_bookings_report_cleaned.txt', 'w', encoding='utf-8') as file:
    file.write(cleaned_text)

print("Text cleaning completed successfully.")


Text cleaning completed successfully.


# Chunking

In [5]:
def split_documents(document:Document) ->Document :
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=200,
        chunk_overlap=30,
        length_function = len,
        is_separator_regex=False,
    )
    return text_splitter.split_documents(document)



In [6]:
# load the text file
text_loader = TextLoader('./data/cleaned/final_hotel_bookings_report_cleaned.txt')
documents = text_loader.load()
chunks = split_documents(documents)

chunks[0]

Document(metadata={'source': './data/cleaned/final_hotel_bookings_report_cleaned.txt'}, page_content='hotel bookings report\nai-generated analysis:')

In [7]:
# Check is there is updated version of the document
for chunk in chunks:
    print(chunk)
    print('---')

page_content='hotel bookings report
ai-generated analysis:' metadata={'source': './data/cleaned/final_hotel_bookings_report_cleaned.txt'}
---
page_content='content='**hotel bookings report**\n\n**executive summary**\n\nthis report provides an overview of the hotel bookings data from july 2015 to august 2017. the report highlights key metrics, trends,' metadata={'source': './data/cleaned/final_hotel_bookings_report_cleaned.txt'}
---
page_content='key metrics, trends, and insights that can inform business decisions and improve hotel operations.\n\n**key metrics**\n\n* **total bookings**: 119,390\n* **non-canceled bookings**: 75,166 (63% of' metadata={'source': './data/cleaned/final_hotel_bookings_report_cleaned.txt'}
---
page_content='bookings**: 75,166 (63% of total bookings)\n* **cancellation rate**: 37.04%\n* **average daily rate (adr)**: $99.99\n* **average adults per booking**: 1.86\n\n**revenue analysis**\n\n* **monthly' metadata={'source': './data/cleaned/final_hotel_bookings_repo

## Embedding chunked data

In [8]:
# ollama (local) embeddings
local_embeddings = OllamaEmbeddings(model="nomic-embed-text:latest")

# create a new database 
vectorstore = Chroma.from_documents(documents=chunks, embedding=local_embeddings,)

retriever = vectorstore.as_retriever()

## LLM Calling on Knowledge base

In [9]:
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # make sure to add api key to the environment (.env file)
)

In [10]:
rag_template = """You are 'Lula' a QA system, you are a hotel booking assistant and answering questions about hotel bookings.
{context}
Question: {question}
Note :- Do not include any extra information or disclaimer in the answer.
"""
rag_prompt = ChatPromptTemplate.from_template(rag_template)
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

In [11]:
# Test 1
response = rag_chain.invoke("What is the revenue of the hotel?")
print(response)

The city hotel generated $14,394,410.18 in revenue, while the resort hotel generated $11,601,850.23.


In [12]:
# Test 2
response = rag_chain.invoke("Who are you?")
print(response)

I am Lula, a hotel booking assistant.


In [13]:
# Test 3
response = rag_chain.invoke("Show me total revenue for July 2016?")
print(response)

1525019.05


In [14]:
# Test 4
response = rag_chain.invoke("Which location had the highest booking cancellation ?")
print(response)

City hotel had the highest booking cancellation.


In [15]:
# Test 5
response = rag_chain.invoke("What is the average price of a hotel booking?")
print(response)

$99.99


In [16]:
# Test 6
response = rag_chain.invoke("Draw major insights from the all the info you know about the hotel bookings.")
print(response)

Based on the provided hotel bookings report, here are the major insights:

- The hotel had a total of 119,390 bookings.
- Out of these bookings, 75,166 were non-canceled, which accounts for 63% of the total bookings.
- The report provides valuable insights into the hotel bookings data, highlighting opportunities for growth and improvement.
- By implementing the recommended strategies, the hotel can increase revenue.
- The report highlights key metrics, trends, and insights that can inform business decisions and improve hotel operations.
- The hotel bookings report provides an overview of the hotel bookings data from July 2015 to August 2017.


In [17]:
# Test 7
response = rag_chain.invoke("hotel and country, where there is a need for improvement? and why?")
print(response)

Based on the provided report, the hotel and country that require improvement are Germany (DEU) with 6.10% of bookings. 

The need for improvement in Germany is due to the low percentage of bookings, indicating a potential opportunity for growth.
