# Retail Lab (Prompt Engineering)

**Learning Objectives:**
  * Practice basic prompt engineering
  * Gain exposure to retail related DataSets

## Context of the datasets

### 1. There are three datasets: `articles.csv.zip`, `customers.csv.zip` and `transactions2020.csv.zip`

#### 2. The Articles dataset contains information over products available.
#### 3. The Customers dataset contains information over registered customers.
#### 4. The Transactions dataset contains purchases of articles made by customers.



## 1. Library Import

In [1]:
!pip install openai

Collecting openai
  Downloading openai-1.30.3-py3-none-any.whl (320 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.6/320.6 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

In [2]:
import pandas as pd
import warnings
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

In [3]:
from openai import OpenAI
import os

In [4]:
warnings.simplefilter('ignore')

# You have to get your [OpenAI API Key](https://platform.openai.com/account/api-keys)

In [5]:
# Used by the agent in this tutorial
os.environ["OPENAI_API_KEY"] = 'YOU-NEED-YOUR-OWN-KEY'

In [6]:
MODEL="gpt-4o"

client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],  # this is also the default, it can be omitted
)

## 2. Data loading and DataFrame creation

In [7]:
Articles=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/articles.csv.zip")

In [8]:
Articles.head(3)

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.


In [9]:
Articles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105542 entries, 0 to 105541
Data columns (total 25 columns):
 #   Column                        Non-Null Count   Dtype 
---  ------                        --------------   ----- 
 0   article_id                    105542 non-null  int64 
 1   product_code                  105542 non-null  int64 
 2   prod_name                     105542 non-null  object
 3   product_type_no               105542 non-null  int64 
 4   product_type_name             105542 non-null  object
 5   product_group_name            105542 non-null  object
 6   graphical_appearance_no       105542 non-null  int64 
 7   graphical_appearance_name     105542 non-null  object
 8   colour_group_code             105542 non-null  int64 
 9   colour_group_name             105542 non-null  object
 10  perceived_colour_value_id     105542 non-null  int64 
 11  perceived_colour_value_name   105542 non-null  object
 12  perceived_colour_master_id    105542 non-null  int64 
 13 

In [10]:
Customers=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/customers.csv.zip")

In [11]:
Customers.sample(3)

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
1167395,d9ccc10d943a661c2439938aefe032c91985955cbd9979...,,,ACTIVE,NONE,22.0,703e158f13675538bc50599b5f02015456081bf76cfb10...
556360,67d46f74c4218cc58d5a08ef5bd2d52a1903c2cb77f232...,,,ACTIVE,NONE,25.0,24fe52f1f1d621f9211382e3d68520fa8f013ae8137e3b...
499990,5d640b3aa3b42ca7dd2d1ff159e355ba734b1533c3041c...,1.0,1.0,ACTIVE,Regularly,32.0,2c29ae653a9282cce4151bd87643c907644e09541abc28...


In [12]:
Customers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1371980 entries, 0 to 1371979
Data columns (total 7 columns):
 #   Column                  Non-Null Count    Dtype  
---  ------                  --------------    -----  
 0   customer_id             1371980 non-null  object 
 1   FN                      476930 non-null   float64
 2   Active                  464404 non-null   float64
 3   club_member_status      1365918 non-null  object 
 4   fashion_news_frequency  1355969 non-null  object 
 5   age                     1356119 non-null  float64
 6   postal_code             1371980 non-null  object 
dtypes: float64(3), object(4)
memory usage: 73.3+ MB


In [13]:
Transactions=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/transactions2020.csv.zip")

In [14]:
Transactions.sample(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
3363012,2020-08-06,3daa295ffaebddbdd71f00a012b2c8abdfdecedf16649f...,892280003,0.042356,2
4891615,2020-09-15,41ee8be69d5f5554f98861ead7d13d6c0fa269e0730d77...,865938002,0.025407,1
1894951,2020-07-03,492fe741df106361d15eb420d47ed3b285ee61b41f8c14...,881192001,0.027102,2


In [15]:
Transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5151470 entries, 0 to 5151469
Data columns (total 5 columns):
 #   Column            Dtype  
---  ------            -----  
 0   t_dat             object 
 1   customer_id       object 
 2   article_id        int64  
 3   price             float64
 4   sales_channel_id  int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 196.5+ MB


## 3. Merging DataFrames

#### 3.1. Transactions-Articles


In [16]:
Transactions.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2


In [17]:
Articles.head(3)

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.


In [18]:
## we merge both DataFrames using the common key: article_id. We store the result in a new DataFrame
TransactionsAndArticles=pd.merge(Transactions, Articles, how='left',on='article_id')

#### 3.2. Transactions-Articles-Customers

In [19]:
TransactionsAndArticles.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2,844198,Saturn trs (J),296,Pyjama bottom,Nightwear,...,Nightwear,B,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear",Pyjama bottoms in sweatshirt fabric with wide ...
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1,777016,Cisco skirt,275,Skirt,Garment Lower body,...,Trousers & Skirt,A,Ladieswear,1,Ladieswear,18,Womens Trend,1009,Trousers,"Calf-length skirt in softly draping, patterned..."
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2,820507,Charlotte Hipster Primula,286,Underwear bottom,Underwear,...,Expressive Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Hipster briefs in lace with a mid waist, lined..."


In [20]:
Customers.head(3)

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,,,ACTIVE,NONE,25.0,2973abc54daa8a5f8ccfe9362140c63247c5eee03f1d93...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,,,ACTIVE,NONE,24.0,64f17e6a330a85798e4998f62d0930d14db8db1c054af6...


In [21]:
## we merge both DataFrames using the common key: customer_id. We store the result in a new DataFrame
TransactionsAndArticlesAndCustomers=pd.merge(TransactionsAndArticles, Customers, how='left',on='customer_id')

In [22]:
TransactionsAndArticlesAndCustomers.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2,844198,Saturn trs (J),296,Pyjama bottom,Nightwear,...,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear",Pyjama bottoms in sweatshirt fabric with wide ...,,,ACTIVE,NONE,40.0,0c0e15f8fa88a1d4aa6ca8a0b4a8289ca1affbaebdea22...
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1,777016,Cisco skirt,275,Skirt,Garment Lower body,...,Womens Trend,1009,Trousers,"Calf-length skirt in softly draping, patterned...",1.0,1.0,ACTIVE,Regularly,59.0,2c29ae653a9282cce4151bd87643c907644e09541abc28...
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2,820507,Charlotte Hipster Primula,286,Underwear bottom,Underwear,...,Womens Lingerie,1017,"Under-, Nightwear","Hipster briefs in lace with a mid waist, lined...",,,ACTIVE,NONE,23.0,8d4ceb946237cf52ce5c2a1a71d1221fde77627a52d661...


## 4. DataFrame creation
#### We select only 15 rows to reduce queries to OpenAI

In [23]:
RetailDataFrame=TransactionsAndArticlesAndCustomers.sample(15)

In [24]:
pd.options.display.max_colwidth = 250
RetailDataFrame.head(2).T

Unnamed: 0,1578228,889633
t_dat,2020-06-27,2020-06-18
customer_id,e2feac3bd72f05ea47521239aae336d8ba70b25ca48d803b8b004a960bd560ae,f5a5e3ebbadb3804a12b0ba04461b3717582bb4fcc370b9e1dbf2d722708a7eb
article_id,874064001,816563006
price,0.06778,0.025407
sales_channel_id,1,2
product_code,874064,816563
prod_name,Pineapple jumpsuit,Drizzle
product_type_no,267,272
product_type_name,Jumpsuit/Playsuit,Trousers
product_group_name,Garment Full body,Garment Lower body


## 4. Generative AI assisted processing

### 4.1. Inferring (Type of Fabric)
#### Given a product description we want to infer the fabric the garment was made from

In [25]:
RetailDataFrame['detail_desc']

1578228           Jumpsuit in an airy viscose and linen weave with a deep V-neck and buttons down the front. Short sleeves with turn-ups, a detachable tie belt at the waist and wide, ankle-length legs with patch pockets at the sides.
889633                                                                     Ankle-length trousers in a softly draping weave. High waist with elastication at the back, pleats at the front, diagonal side pockets and wide, straight legs.
2486804                                                                Cap-sleeved top in softly draping jersey with a double-layered stand-up collar that has gathers at the front and an opening with a button at the back of the neck.
4491597                                                                                                    Gently tailored jacket in a crêpe weave with notch lapels, gathered, 3/4-length sleeves and side pockets. No fasteners. Lined.
4221137                                                         

In [26]:
def fabric_finder(productdescription):

    prompt= f"""
    Given the following product descriotion your task is to infer the fabric the garment was made of. Provide just the name of the fabric.If you don't know the anwser just say I don't know: ```{productdescription}```

    """


    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message.content

In [27]:
fabric_finder('Thong briefs in cotton jersey with a wide lace trim at the top. Low waist, lined gusset, narrow sides and a string back.')

'Cotton'

In [28]:
RetailDataFrame['detail_desc'].apply(lambda x: fabric_finder(x))

1578228    Viscose and linen
889633          I don't know
2486804         I don't know
4491597         I don't know
4221137         I don't know
58592                Viscose
170855                Cotton
2428589               Cotton
64898           I don't know
1781155               Cotton
522605                 Denim
633417                 Denim
3390368         I don't know
4758875         I don't know
1028899         I don't know
Name: detail_desc, dtype: object

### 4.2. Inferring (Product category)
#### Given a product description we want to infer the category the product belongs to

In [29]:
def garment_finder(productdescription):

    prompt= f"""
    Given the following product description your task is to infer the category of product being described. Provide just the name of the category.If you don't know the anwser just say I don't know: ```{productdescription}```

    """


    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message.content

In [30]:
garment_finder('Thong briefs in cotton jersey with a wide lace trim at the top. Low waist, lined gusset, narrow sides and a string back.')

'Lingerie'

In [31]:
RetailDataFrame['detail_desc'].apply(lambda x: garment_finder(x))

1578228            Clothing
889633                Pants
2486804    Women's Clothing
4491597              Blazer
4221137            Clothing
58592                 Dress
170855     Women's Clothing
2428589            Clothing
64898                 Dress
1781155            Clothing
522605             Clothing
633417             Clothing
3390368            Swimwear
4758875            Trousers
1028899             Jewelry
Name: detail_desc, dtype: object

### 4.1. Expansion (Product Name)
#### Given a product description we want to find a suitable name for the product

In [32]:
def name_finder(productdescription):

    prompt= f"""
    Given the following product description your task is to find a suitable name for it. The name must be appealing to adults.Provide just the name ```{productdescription}```

    """


    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message.content

In [33]:
name_finder('Thong briefs in cotton jersey with a wide lace trim at the top. Low waist, lined gusset, narrow sides and a string back.')

'Lace Elegance Thong'

In [34]:
def name_finder(productdescription):

    prompt= f"""
    Given the following product description your task is to find a suitable name for it. The name must be catchy and appealing to teenagers.Provide just the name ```{productdescription}```

    """


    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message.content

In [35]:
name_finder('Thong briefs in cotton jersey with a wide lace trim at the top. Low waist, lined gusset, narrow sides and a string back.')

'Lace Embrace Thong'

In [36]:
RetailDataFrame['detail_desc'].apply(lambda x: name_finder(x))

1578228                 "VibeStride Jumpsuit"
889633                   "ChillWave Trousers"
2486804               "Chic Collar Crush Top"
4491597              "ChillWave Crêpe Blazer"
4221137          CozyChic Ribbed Polo Sweater
58592                "VibeFlare V-Neck Dress"
170855               EcoChic Long Sleeve Tees
2428589                  "ChillFlex Crop Top"
64898              "Puff 'n' Slit Chic Dress"
1781155                       "VibeTribe Tee"
522605           Denim Edge High-Waist Shorts
633417                  DenimFlex Vibe Shorts
3390368    "WaveChic Low-Rise Cutaway Bikini"
4758875                   TrendTaper Trousers
1028899              Leaf & Loop Duo Necklace
Name: detail_desc, dtype: object