# Custom Chatbot Project

## Project objective

This project is about making a small customization to OpenAI model by adding some data as the context (in the prompt) 


## OpenAI Setting

In [None]:
!pip install git+https://github.com/openai/whisper.git

In [82]:
# open AI setting
import openai
openAIKey= 'REPLACE_BY_THE_KEY'
openai.api_key = openAIKey

import tiktoken

## Data Selection

Selected "2023_fashion_trends.csv" for the custom data. This contains information in year 2023 and will add addtional inforamtion to the model considering that training data for gpt3.5 is upto Sep 2021 (as of Jan 2024, https://platform.openai.com/docs/models/gpt-3-5)

## Data Wrangling

In [115]:
import pandas as pd

dataPath = './data/2023_fashion_trends.csv'

orig_df = pd.read_csv(dataPath)

print(orig_df)

df = pd.DataFrame()
df['magazine'] = orig_df['URL'].str.extract('www\.(.*?)\.com')
df['Trends'] = orig_df['Trends']
df['text'] = df['magazine'] + ":" + df['Trends']
print(df)


                                                  URL  \
0   https://www.refinery29.com/en-us/fashion-trend...   
1   https://www.refinery29.com/en-us/fashion-trend...   
2   https://www.refinery29.com/en-us/fashion-trend...   
3   https://www.refinery29.com/en-us/fashion-trend...   
4   https://www.refinery29.com/en-us/fashion-trend...   
..                                                ...   
77  https://www.whowhatwear.com/spring-summer-2023...   
78  https://www.whowhatwear.com/spring-summer-2023...   
79  https://www.whowhatwear.com/spring-summer-2023...   
80  https://www.whowhatwear.com/spring-summer-2023...   
81  https://www.whowhatwear.com/spring-summer-2023...   

                                               Trends  \
0   2023 Fashion Trend: Red. Glossy red hues took ...   
1   2023 Fashion Trend: Cargo Pants. Utilitarian w...   
2   2023 Fashion Trend: Sheer Clothing. "Bare it a...   
3   2023 Fashion Trend: Denim Reimagined. From dou...   
4   2023 Fashion Trend: Shine 

In [117]:
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
batch_size = 100
embeddings = []
for i in range(0, len(df), batch_size):
    # Send text data to OpenAI model to get embeddings

    inputText = df.iloc[i:i+batch_size]["text"]

    response = openai.Embedding.create(
        input=inputText.tolist(),
        engine=EMBEDDING_MODEL_NAME
    )
    
    # Add embeddings to list
    embeddings.extend([data["embedding"] for data in response["data"]])

# Add embeddings list to dataframe
df["embeddings"] = embeddings
df

Unnamed: 0,magazine,Trends,text,embeddings
0,refinery29,2023 Fashion Trend: Red. Glossy red hues took ...,refinery29:2023 Fashion Trend: Red. Glossy red...,"[-0.02681100182235241, -0.030414843931794167, ..."
1,refinery29,2023 Fashion Trend: Cargo Pants. Utilitarian w...,refinery29:2023 Fashion Trend: Cargo Pants. Ut...,"[-0.009645801968872547, -0.03390726447105408, ..."
2,refinery29,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",refinery29:2023 Fashion Trend: Sheer Clothing....,"[-0.014845591969788074, -0.02283831685781479, ..."
3,refinery29,2023 Fashion Trend: Denim Reimagined. From dou...,refinery29:2023 Fashion Trend: Denim Reimagine...,"[-0.02278187870979309, -0.01047559641301632, 0..."
4,refinery29,2023 Fashion Trend: Shine For The Daytime. The...,refinery29:2023 Fashion Trend: Shine For The D...,"[-0.0104591129347682, 0.0003251482849009335, 0..."
...,...,...,...,...
77,whowhatwear,"If lime green isn't your vibe, rest assured th...","whowhatwear:If lime green isn't your vibe, res...","[-0.0035650806967169046, -0.016749152913689613..."
78,whowhatwear,"""As someone who can clearly (not fondly) remem...","whowhatwear:""As someone who can clearly (not f...","[-0.016724731773138046, -0.00880926102399826, ..."
79,whowhatwear,"""Combine this design shift with the fact that ...","whowhatwear:""Combine this design shift with th...","[-0.021235937252640724, -0.025398779660463333,..."
80,whowhatwear,Thought party season ended at the stroke of mi...,whowhatwear:Thought party season ended at the ...,"[-0.021053023636341095, -0.019290756434202194,..."


## Custom Query Completion


In [118]:
from openai.embeddings_utils import get_embedding, distances_from_embeddings

def get_rows_sorted_by_relevance(question, df):

    EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
    
    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)
    
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine"
    )
    
    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy


In [119]:
# create prompt

from openai.embeddings_utils import get_embedding, distances_from_embeddings


def create_prompt(question, df, max_token_count, withData=True):
    tokenizer = tiktoken.get_encoding("cl100k_base")
    prompt_template = """
Answer the question based on the context below

Context:

{}

---

Question: {}
Answer:
    """

    current_token_count = len(tokenizer.encode(prompt_template)) + len(tokenizer.encode(question))

    context = []
    if withData:
        for text in get_rows_sorted_by_relevance(question, df)["text"].values:
            text_token_count = len(tokenizer.encode(text))
            current_token_count += text_token_count

            if current_token_count <= max_token_count:
                context.append(text)
            else:
                break

    return prompt_template.format("\n\n###\n\n".join(context), question)



In [120]:
# create answer 

COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"

def answer_question(
    question, df, max_prompt_tokens=1800, max_answer_tokens=300, withData=True
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model
    
    If the model produces an error, return an empty string
    """
    
    prompt = create_prompt(question, df, max_prompt_tokens, withData)
   
    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print('error')
        print(e)
        return ""

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [121]:
# basic answer

question1 = "what are the fashion trends in 2023?"
withCustomData = False

print(answer_question(question1, df, 2000, 1000, withData=withCustomData))

Based on the given context, it is not possible to accurately answer the question as the context does not provide any information about fashion trends in 2023.


In [122]:
# custom answer
withCustomData = True

print(answer_question(question1, df, 2000, 1000, withData=withCustomData))

1. Sheer clothing
2. Daytime shine
3. Reds
4. Cargo pants
5. Denim reimagined
6. Indie sleaze
7. Perfectly cut trousers
8. Simplicity and everyday dressing
9. Cobalt Blue
10. Maxi skirts
11. Pinstripe tailoring
12. The tailored look.


### Question 2

In [129]:
# basic answer

question2 = "how the descriptions for the fashion trends in 2023 are different between Glamour and Vogue? Please highlight the examples mentioned by each. If you do not have any examples, please tell so"
withCustomData = False

print(answer_question(question2, df, 2000, 1000, withData=withCustomData))

Unfortunately, I do not have any examples as I am a language AI and do not have access to current or future fashion trends. I suggest consulting fashion magazines like Glamour and Vogue for more information on the different descriptions for fashion trends in 2023.


In [131]:
# custom answer
withCustomData = True

print(answer_question(question2, df, 2000, 1000, withData=withCustomData))

Glamour's description of the 2023 fashion trends focuses on a specific decade, the '90s, with their mention of "dopamine-dressing" and "Phoebe Philo." They also highlight the trend of subdued looks and wardrobe essentials such as tailored blazers, formfitting tees, and loose trousers.

Vogue, on the other hand, offers a more diverse range of trends, such as the modern boho trend seen on the runways of Jil Sander, Marni, Prada, and Dries van Noten. They also mention the trend of detailed denim, using Altuzarra's maxi skirt as an example. Additionally, they mention a "liquid silver" trend, as well as a focus on shine for daytime wear.
