<h3>Rainbow 365 modelling</h3>
To find a suitable model to use for suggesting missing fruits and vegetables colours to achieve the 5-colour diet and to suggest recipes. The chosen model will be used to build a chatbot.

## Contents
1. [Install Packages](#Install-Packages)
2. [Import Libraries and set OpenAI API Key](#Import-Libraries-and-set-OpenAI-API-Key)
3. [Set Filepath and Load Data](#Set-Filepath-and-Load-Data)
4. [Build Index](#Build-Index)
5. [Build Index with Service Context](#Build-Index-with-Service-Context)
6. [Evaluation with GPT-3.5-Turbo](#Evaluation-with-GPT-3.5-Turbo)
7. [Evaluation with GPT-4](#Evaluation-with-GPT-4)
8. [Evaluation with GPT-4-1106](#Evaluation-with-GPT-4-1106)
9. [Results](#Results)
10. [Models evaluation - Chosen model is gpt-4-1106.](#Models-evaluation---Chosen-model-is-gpt-4-1106.)

<h3>Install Packages </h3>

In [177]:
#install relevant packages needed for openai LLM modelling
# !pip install llama_index==0.8.64
# !pip install openai==1.10.0
# !pip install spacy
# %pip install llama-index==0.8.64
# !pip install pypdf
# !pip install sentence-transformers
# !pip install ragas
# %pip install ipywidgets==7.7.5

<h3> Import Libraries and set OpenAI API Key</h3>

In [233]:
from llama_index import Document, GPTVectorStoreIndex, ServiceContext, download_loader, VectorStoreIndex
from llama_index.readers import BeautifulSoupWebReader, SimpleDirectoryReader
from llama_index.llms import OpenAI
from llama_index.evaluation import DatasetGenerator
from llama_index.response.notebook_utils import display_response
import openai
from pathlib import Path
import os
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import answer_relevancy, faithfulness
import pandas as pd
#input the OpenAI API key here between the double quotation marks
os.environ['OPENAI_API_KEY'] = ""
openai.api_key = os.getenv("OPENAI_API_KEY")

<h3>Set Filepath and Load Data</h3>

In [236]:
#finding the folder where the data file is located at relative to where we are
current_dir = os.getcwd()
data_dir = os.path.join(current_dir, "data")

In [238]:
#load dataset and vectorise for faster retrieval of information for model training
PagedCSVReader = download_loader("PagedCSVReader")
loader = PagedCSVReader(encoding="utf-8")
docs = loader.load_data(file=Path(data_dir+'/data.csv'))

<h3>Build Index</h3>

In [241]:
#display the vectorised data
docs

[Document(id_='22fb60fc-62d5-4a75-a5fe-8d3c5aca71a7', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='2182c5a08d66429d05647c81f8255c9969ade0bc120058f7805412bf1abff3ff', text="Question: I have an apple, an avocado, and a banana in my cart. Does this meet the 5-color diet, and what recipe can I make?\nAnswer: You're missing blue/purple. Try adding blueberries for a balanced diet. You can make a fruit salad with apple, avocado, banana, and blueberries.", start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='f49d598c-264f-443b-bc39-7e73d2d92953', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='6e1f274d9090f14b5375ac24e570d3f7c246cc7e7d6adbc74549691aafd0745b', text="Question: I have pomegranates, oranges, and green apples in my cart. Does this meet 

<h3>Build Index with Service Context</h3>
With all the data loaded, we can construct the index for the chatbot. There are 4 types of indexing: Summary index, VectorStore Index, Tree Index and Keyword Table Index. Here we are using VectorStore Index, which is also one of the most common types of indexing.

In [244]:
#use gpt-4 model for indexing documents and store in index.vecstore
service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-4", temperature=0)
)
index = GPTVectorStoreIndex.from_documents(documents=docs, service_context=service_context)
index.storage_context.persist(persist_dir="")

<h3> Evaluation with GPT-3.5-Turbo </h3>
<ul><li>evaluation questions were created in relation to the shopping cart context listed below</li>


<li>Prompt: give me a list of questions that are similar to the following question and make use of the items in the attached file: i have an apple, a broccoli, a banana in my cart. tell me if this meets the 5-colour diet and if not what is missing in the 5 colour diet and what suggested recipe that can incorporate the items.</li></ul>

In [246]:
#retrieving the evaluation questions
questions = []
with open("data/modified_eval_questions.txt", "r") as f:
    for line in f:
        questions.append(line.strip())

In [248]:
#evaluating the gpt-3.5-turbo model performance based on the evaluation questions
gpt_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0), context_window=2048
)
index = VectorStoreIndex.from_documents(docs, service_context=gpt_context)
query_engine = index.as_query_engine(similarity_top_k=2)

In [249]:
#generating and storing the answers to the evaluation questions using the gpt-3.5-turbo model.
contexts = []
answers = []

for question in questions:
    response = query_engine.query(question)
    contexts.append([x.node.get_content() for x in response.source_nodes])
    answers.append(str(response))

In [251]:
#calculate the relevancy and faithfulness scores based on the answers generated based on the gpt-3.5-turbo model questions
ds = Dataset.from_dict(
    {
        "question": questions,
        "answer": answers,
        "contexts": contexts,
    }
)


result = evaluate(ds,[answer_relevancy, faithfulness])
print(result)

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'answer_relevancy': 0.9026, 'faithfulness': 0.3433}


In [252]:
#printing out the answer by gpt-3.5-turbo based on question 2
query_engine = index.as_query_engine(service_context=gpt_context)
response_35_turbo = query_engine.query(questions[2])

print(response_35_turbo)

You should add bananas to achieve the 5-color diet. To try a recipe, consider making a mixed fruit smoothie with pomegranates, oranges, blueberries, and bananas.


<h3>Evaluation with GPT-4</h3>
<ul><li>evaluation questions were created in relation to the shopping cart context listed below</li>


<li>Prompt: give me a list of questions that are similar to the following question and make use of the items in the attached file: i have an apple, a broccoli, a banana in my cart. tell me if this meets the 5-colour diet and if not what is missing in the 5 colour diet and what suggested recipe that can incorporate the items.</li></ul>

In [254]:
#retrieving the evaluation questions
gpt_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-4", temperature=0), context_window=2048
)
index = VectorStoreIndex.from_documents(docs, service_context=gpt_context)
query_engine = index.as_query_engine(similarity_top_k=2)

In [255]:
#evaluating the gpt-4 model performance based on the evaluation questions
contexts = []
answers = []

for question in questions:
    response = query_engine.query(question)
    contexts.append([x.node.get_content() for x in response.source_nodes])
    answers.append(str(response))

In [256]:
#generating and storing the answers to the evaluation questions using the gpt-4 model.
ds = Dataset.from_dict(
    {
        "question": questions,
        "answer": answers,
        "contexts": contexts,
    }
)

result = evaluate(ds,[answer_relevancy, faithfulness])
print(result)

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'answer_relevancy': 0.8014, 'faithfulness': 0.2857}


In [257]:
#printing out the answer by gpt-4 based on question 2
query_engine = index.as_query_engine(service_context=gpt_context)
response_4 = query_engine.query(questions[2])

print(response_4)

You're missing green and white/tan. Add green apples and bananas to your cart. You could try making a fruit salad with pomegranates, oranges, blueberries, green apples, and bananas.


<h3>Evaluation with GPT-4-1106 </h3>
<ul><li>evaluation questions were created in relation to the shopping cart context listed below</li>


<li>Prompt: give me a list of questions that are similar to the following question and make use of the items in the attached file: i have an apple, a broccoli, a banana in my cart. tell me if this meets the 5-colour diet and if not what is missing in the 5 colour diet and what suggested recipe that can incorporate the items.</li></ul>

In [259]:
#retrieving the evaluation questions
gpt_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-4-1106-preview", temperature=0), context_window=2048
)
index = VectorStoreIndex.from_documents(docs, service_context=gpt_context)
query_engine = index.as_query_engine(similarity_top_k=2)

In [260]:
#evaluating the gpt-4-1106 model performance based on the evaluation questions
contexts = []
answers = []

for question in questions:
    response = query_engine.query(question)
    contexts.append([x.node.get_content() for x in response.source_nodes])
    answers.append(str(response))

In [261]:
#generating and storing the answers to the evaluation questions using the gpt-4-1106 model.
ds = Dataset.from_dict(
    {
        "question": questions,
        "answer": answers,
        "contexts": contexts,
    }
)

result = evaluate(ds,[answer_relevancy, faithfulness])
print(result)

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'answer_relevancy': 0.8955, 'faithfulness': 0.3375}


In [262]:
#printing out the answer by gpt-4-1106 based on question 2
query_engine = index.as_query_engine(service_context=gpt_context)
response_4_1106 = query_engine.query(questions[2])

print(response_4_1106)

To achieve the 5-color diet, you should add something white/tan and green. Consider adding bananas and green apples. A recipe you can try is a fruit salad with pomegranates, oranges, blueberries, bananas, and green apples.


<h3> Results </h3>
For the evaluation, we will be using the following two metrics and outcomes comparison:

- `answer_relevancy` - This measures how relevant is the generated answer to the prompt. If the generated answer is incomplete or contains redundant information the score will be low. This is quantified by working out the chance of an LLM generating the given question using the generated answer. Values range (0,1), higher the better.  
- `faithfulness` - This measures the factual consistency of the generated answer against the given context. This is done using a multi step paradigm that includes creation of statements from the generated answer followed by verifying each of these statements against the context. The answer is scaled to (0,1) range. Higher the better.

In [279]:
#to consolidate the results and present it in a table
models = ["gpt-3.5-turbo", "gpt-4", "gpt-4-1106"]
relevancy_scores = [0.9026, 0.8014, 0.8955]
faithfulness_scores = [0.3433, 0.2875, 0.3375]

ragas_scores = [(r + f) / 2 for r, f in zip(relevancy_scores, faithfulness_scores)]

df = pd.DataFrame({
    "Model": models,
    "Relevancy": relevancy_scores,
    "Faithfulness": faithfulness_scores,
    "Ragas Score": ragas_scores
})

print(df)

           Model  Relevancy  Faithfulness  Ragas Score
0  gpt-3.5-turbo     0.9026        0.3433      0.62295
1          gpt-4     0.8014        0.2875      0.54445
2     gpt-4-1106     0.8955        0.3375      0.61650


In [265]:
#printing all 3 responses to the same question by the 3 models for comparison
print(response_35_turbo)
print(response_4)
print(response_4_1106)

You should add bananas to achieve the 5-color diet. To try a recipe, consider making a mixed fruit smoothie with pomegranates, oranges, blueberries, and bananas.
You're missing green and white/tan. Add green apples and bananas to your cart. You could try making a fruit salad with pomegranates, oranges, blueberries, green apples, and bananas.
To achieve the 5-color diet, you should add something white/tan and green. Consider adding bananas and green apples. A recipe you can try is a fruit salad with pomegranates, oranges, blueberries, bananas, and green apples.


<h3>Models evaluation - Chosen model is gpt-4-1106.</h3>
<ul>
    <li>All relevancy scores are pretty high with values above 0.8. Faithfulness is low but answers given by models are still mostly correct.</li>
    <li>Faithfulness is low because recipes data is from the model's own dataset and not the one that was provided. It would have been ideal if answers regarding recipes could have been taken from the dataset rather than from the model's own dataset.</li>
    <li>As such, we look to the responses to assess which model is the best for this purpose as this point in time.</li>
    <li>The question we're using to assess the responses to is: My cart already has pomegranates, oranges, and blueberries. What else should I add to achieve the 5 colour diet, and what recipe can I try?</li>    <li>
The first response from gpt-3.5-turbo is missing green colour. So it's down to the other 2 models and since in essence they are the same answers, I would pick gpt-4-1106 since it's more conversational </li>
</ul>