# Data Wrangling
#### Dataset: 2023 Fashion Trends

- I chose the **2023 Fashion Trends** dataset primarily because it offers timely and relevant insights into current (2023) fashion dynamics, making it ideal for a custom chatbot. Fashion evolves rapidly, and utilizing a dataset specifically from 2023 ensures that the insights are contemporary and accurately reflect recent trends.

- The dataset provides structured information, including clearly outlined trends and corresponding source URLs. This structure allows me to efficiently perform meaningful data wrangling and effectively use the dataset for further analysis.

- Additionally, the detailed textual descriptions sourced from diverse websites make this dataset particularly well-suited for creating high-quality embeddings. These embeddings will greatly enhance the performance of my custom RAG-style chatbot, enabling it to deliver precise, context-rich responses about 2023 fashion trends (and source tracking for validating).

- In conclusion, the relevance, diversity of sources, and suitability for embedding-based approaches make the **2023 Fashion Trends** dataset appropriate choice for my task.


Initial Imports

In [1]:
import pandas as pd
from pathlib import Path
import os

from dotenv import load_dotenv
load_dotenv()

# Custom Functions
from fncs.utilities import (create_openai_client,
                            response_generator,
                            prompt_builder,
                            calculate_total_cost
                            )
from fncs.retrieval import create_embeddings_batch

# Load environment vars:
base_url_voc = os.getenv("OPENAI_BASE_VOC")
api_key_voc = os.getenv("OPENAI_API_VOC")

emb_name = 'text-embedding-3-large'

openai_client = create_openai_client(api_key= api_key_voc, base_url= base_url_voc)

Getting the project path:

In [2]:
proj_dir = Path(os.getcwd())

Reading csv file:

In [3]:
df = pd.read_csv(proj_dir / "data" / "2023_fashion_trends.csv")

In [4]:
df.head()

Unnamed: 0,URL,Trends,Source
0,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Red. Glossy red hues took ...,7 Fashion Trends That Will Take Over 2023 — Sh...
1,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Cargo Pants. Utilitarian w...,7 Fashion Trends That Will Take Over 2023 — Sh...
2,https://www.refinery29.com/en-us/fashion-trend...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a...",7 Fashion Trends That Will Take Over 2023 — Sh...
3,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Denim Reimagined. From dou...,7 Fashion Trends That Will Take Over 2023 — Sh...
4,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Shine For The Daytime. The...,7 Fashion Trends That Will Take Over 2023 — Sh...


In [5]:
df['URL_processed'] = df['URL'].str.split('/').str[2]
df['URL_processed'] = df['URL_processed'].str.split('.').str[1]
df[['URL_processed']].tail(3)

Unnamed: 0,URL_processed
79,whowhatwear
80,whowhatwear
81,whowhatwear


In [6]:
df[['URL_processed','URL','Source', 'Trends']].head()

Unnamed: 0,URL_processed,URL,Source,Trends
0,refinery29,https://www.refinery29.com/en-us/fashion-trend...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Red. Glossy red hues took ...
1,refinery29,https://www.refinery29.com/en-us/fashion-trend...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Cargo Pants. Utilitarian w...
2,refinery29,https://www.refinery29.com/en-us/fashion-trend...,7 Fashion Trends That Will Take Over 2023 — Sh...,"2023 Fashion Trend: Sheer Clothing. ""Bare it a..."
3,refinery29,https://www.refinery29.com/en-us/fashion-trend...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Denim Reimagined. From dou...
4,refinery29,https://www.refinery29.com/en-us/fashion-trend...,7 Fashion Trends That Will Take Over 2023 — Sh...,2023 Fashion Trend: Shine For The Daytime. The...


Creating a 'text' feature. This dataframe feature includes all the information needed to be used later in a custom RAG-style chatbot.

In [7]:
# Using string format method
text_chunk = \
"""
Fashion trends according to {brand}

Source Title: {source}

{trends}

Source URL: {url}
"""

df['text'] = df.apply(lambda row: text_chunk.format(
    brand = row['URL_processed'],
    source=row['Source'],
    url=row['URL'],
    trends=row['Trends']), axis=1)

print(df[['text']].iloc[0].values[0])


Fashion trends according to refinery29

Source Title: 7 Fashion Trends That Will Take Over 2023 — Shop Them Now

2023 Fashion Trend: Red. Glossy red hues took over the Fall 2023 runways ranging from Sandy Liang and PatBo to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags, jewelry).

Source URL: https://www.refinery29.com/en-us/fashion-trends-2023



Generating Embeddings from the feature 'text'

In [8]:
final_df, cost=\
create_embeddings_batch(
    client=openai_client,
    deployment_name=emb_name,
    batch_size=10,
    df=df,
    chunk_column='text'
)
print(f'Total Cost(eur): {cost}')

Total Cost(eur): 0.0015385


In [9]:
final_df[['text','embeddings']].head(2)

Unnamed: 0,text,embeddings
0,\nFashion trends according to refinery29\n\nSo...,"[-0.06195216625928879, -0.007596897892653942, ..."
1,\nFashion trends according to refinery29\n\nSo...,"[-0.0732918530702591, -0.014976361766457558, -..."


Saving the final dataframe with only two features: the text chunks and their embeddings

In [10]:
final_df[['text','embeddings']].to_csv(proj_dir / "data" / "2023_fashion_trends_embeddings.csv", index=False)