# GenerativeAI4DS-I
## Lab. Retail Image Understanding

##  What I hope you'll get out of this lab
* Be aware of the potential of generative AI to process product images

This notebook explores how to leverage GPT-4o to tag & caption images.

We can leverage the multimodal capabilities of GPT-4o to provide input images along with additional context on what they represent, and prompt the model to output tags or image descriptions. The image descriptions can then be further refined with a language model (in this notebook, we'll use GPT-4o) to generate captions.

Generating text content from images can be useful for multiple use cases, especially use cases involving search.
We will illustrate a search use case in this notebook by using generated keywords and product captions to search for products - both from a text input and an image input.

As an example, we will use a dataset of Amazon furniture items, tag them with relevant keywords and generate short, descriptive captions.

In [1]:
!pip install openai
!pip install scikit-learn

Collecting openai
  Downloading openai-1.30.3-py3-none-any.whl (320 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/320.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.4/320.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.6/320.6 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-p

In [2]:
from IPython.display import Image, display
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from openai import OpenAI
import os

In [3]:
def show_json(obj):
    display(json.loads(obj.model_dump_json()))

In [4]:
# We need this to load the files onto google colab
!git clone https://github.com/thousandoaks/GenerativeAI4DS-I.git

Cloning into 'GenerativeAI4DS-I'...
remote: Enumerating objects: 113, done.[K
remote: Counting objects: 100% (113/113), done.[K
remote: Compressing objects: 100% (107/107), done.[K
remote: Total 113 (delta 39), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (113/113), 3.73 MiB | 14.88 MiB/s, done.
Resolving deltas: 100% (39/39), done.


# 1. You have to get your [OpenAI API Key](https://platform.openai.com/account/api-keys)

In [5]:
# Used by the agent in this tutorial
os.environ["OPENAI_API_KEY"] = "YOU-NEED-YOUR-OWN-KEY"

In [6]:
client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],  # this is also the default, it can be omitted
)

In [7]:
# Loading dataset
dataset_path =  "/content/GenerativeAI4DS-I/datasets/amazon_furniture_dataset.csv"
df = pd.read_csv(dataset_path).head(5)
df.head()

Unnamed: 0,asin,url,title,brand,price,availability,categories,primary_image,images,upc,...,color,material,style,important_information,product_overview,about_item,description,specifications,uniq_id,scraped_at
0,B0CJHKVG6P,https://www.amazon.com/dp/B0CJHKVG6P,"GOYMFK 1pc Free Standing Shoe Rack, Multi-laye...",GOYMFK,$24.99,Only 13 left in stock - order soon.,"['Home & Kitchen', 'Storage & Organization', '...",https://m.media-amazon.com/images/I/416WaLx10j...,['https://m.media-amazon.com/images/I/416WaLx1...,,...,White,Metal,Modern,[],"[{'Brand': ' GOYMFK '}, {'Color': ' White '}, ...",['Multiple layers: Provides ample storage spac...,"multiple shoes, coats, hats, and other items E...","['Brand: GOYMFK', 'Color: White', 'Material: M...",02593e81-5c09-5069-8516-b0b29f439ded,2024-02-02 15:15:08
1,B0B66QHB23,https://www.amazon.com/dp/B0B66QHB23,"subrtex Leather ding Room, Dining Chairs Set o...",subrtex,,,"['Home & Kitchen', 'Furniture', 'Dining Room F...",https://m.media-amazon.com/images/I/31SejUEWY7...,['https://m.media-amazon.com/images/I/31SejUEW...,,...,Black,Sponge,Black Rubber Wood,[],,['【Easy Assembly】: Set of 2 dining room chairs...,subrtex Dining chairs Set of 2,"['Brand: subrtex', 'Color: Black', 'Product Di...",5938d217-b8c5-5d3e-b1cf-e28e340f292e,2024-02-02 15:15:09
2,B0BXRTWLYK,https://www.amazon.com/dp/B0BXRTWLYK,Plant Repotting Mat MUYETOL Waterproof Transpl...,MUYETOL,$5.98,In Stock,"['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo...",https://m.media-amazon.com/images/I/41RgefVq70...,['https://m.media-amazon.com/images/I/41RgefVq...,,...,Green,Polyethylene,Modern,[],"[{'Brand': ' MUYETOL '}, {'Size': ' 26.8*26.8 ...","['PLANT REPOTTING MAT SIZE: 26.8"" x 26.8"", squ...",,"['Brand: MUYETOL', 'Size: 26.8*26.8', 'Item We...",b2ede786-3f51-5a45-9a5b-bcf856958cd8,2024-02-02 15:15:09
3,B0C1MRB2M8,https://www.amazon.com/dp/B0C1MRB2M8,"Pickleball Doormat, Welcome Doormat Absorbent ...",VEWETOL,$13.99,Only 10 left in stock - order soon.,"['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo...",https://m.media-amazon.com/images/I/61vz1Igler...,['https://m.media-amazon.com/images/I/61vz1Igl...,,...,A5589,Rubber,Modern,[],"[{'Brand': ' VEWETOL '}, {'Size': ' 16*24INCH ...","['Specifications: 16x24 Inch ', "" High-Quality...",The decorative doormat features a subtle textu...,"['Brand: VEWETOL', 'Size: 16*24INCH', 'Materia...",8fd9377b-cfa6-5f10-835c-6b8eca2816b5,2024-02-02 15:15:10
4,B0CG1N9QRC,https://www.amazon.com/dp/B0CG1N9QRC,JOIN IRON Foldable TV Trays for Eating Set of ...,JOIN IRON Store,$89.99,Usually ships within 5 to 6 weeks,"['Home & Kitchen', 'Furniture', 'Game & Recrea...",https://m.media-amazon.com/images/I/41p4d4VJnN...,['https://m.media-amazon.com/images/I/41p4d4VJ...,,...,Grey Set of 4,Iron,X Classic Style,[],,['Includes 4 Folding Tv Tray Tables And one Co...,Set of Four Folding Trays With Matching Storag...,"['Brand: JOIN IRON', 'Shape: Rectangular', 'In...",bdc9aa30-9439-50dc-8e89-213ea211d66a,2024-02-02 15:15:11


# 2. Image Tagging
In this section, we'll use GPT-4Vision to generate relevant tags for our products.

We'll use a simple zero-shot approach to extract keywords, and deduplicate those keywords using embeddings to avoid having multiple keywords that are too similar.

We will use a combination of an image and the product title to avoid extracting keywords for other items that are depicted in the image - sometimes there are multiple items used in the scene and we want to focus on just the one we want to tag.


### 2.1. Extract keywords from images


In [8]:
system_prompt = '''
    You are an agent specialized in tagging images of furniture items, decorative items, or furnishings with relevant keywords that could be used to search for these items on a marketplace.

    You will be provided with an image and the title of the item that is depicted in the image, and your goal is to extract keywords for only the item specified.

    Keywords should be concise and in lower case.

    Keywords can describe things like:
    - Item type e.g. 'sofa bed', 'chair', 'desk', 'plant'
    - Item material e.g. 'wood', 'metal', 'fabric'
    - Item style e.g. 'scandinavian', 'vintage', 'industrial'
    - Item color e.g. 'red', 'blue', 'white'

    Only deduce material, style or color keywords when it is obvious that they make the item depicted in the image stand out.

    Return keywords in the format of an array of strings, like this:
    ['desk', 'industrial', 'metal']

'''

In [9]:
def analyze_image(img_url, title):
    response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "system",
            "content": system_prompt
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": img_url,
                },
            ],
        },
        {
            "role": "user",
            "content": title
        }
    ],
        max_tokens=300,
        top_p=0.1
    )

    return response.choices[0].message.content

#### Let's try with some examples

In [10]:
examples = df.iloc[:5]

In [11]:
for index, ex in examples.iterrows():
    url = ex['primary_image']
    img = Image(url=url)
    display(img)
    result = analyze_image(url, ex['title'])
    print(result)
    print("\n\n")

['shoe rack', 'free standing', 'multi-layer', 'metal', 'white', 'with hooks']





['dining chairs', 'set of 2', 'leather', 'black']





['plant repotting mat', 'waterproof', 'portable', 'foldable', 'easy to clean', 'green']





['doormat', 'absorbent', 'non-slip', 'brown', 'coir']





['tv tray table set', 'foldable', 'metal', 'grey']





# 3. Captions generation
In this section, we'll use GPT-4V to generate an image description

In [12]:
# Cleaning up dataset columns
selected_columns = ['title', 'primary_image', 'style', 'material', 'color', 'url']
df = df[selected_columns].copy()
df.head()

Unnamed: 0,title,primary_image,style,material,color,url
0,"GOYMFK 1pc Free Standing Shoe Rack, Multi-laye...",https://m.media-amazon.com/images/I/416WaLx10j...,Modern,Metal,White,https://www.amazon.com/dp/B0CJHKVG6P
1,"subrtex Leather ding Room, Dining Chairs Set o...",https://m.media-amazon.com/images/I/31SejUEWY7...,Black Rubber Wood,Sponge,Black,https://www.amazon.com/dp/B0B66QHB23
2,Plant Repotting Mat MUYETOL Waterproof Transpl...,https://m.media-amazon.com/images/I/41RgefVq70...,Modern,Polyethylene,Green,https://www.amazon.com/dp/B0BXRTWLYK
3,"Pickleball Doormat, Welcome Doormat Absorbent ...",https://m.media-amazon.com/images/I/61vz1Igler...,Modern,Rubber,A5589,https://www.amazon.com/dp/B0C1MRB2M8
4,JOIN IRON Foldable TV Trays for Eating Set of ...,https://m.media-amazon.com/images/I/41p4d4VJnN...,X Classic Style,Iron,Grey Set of 4,https://www.amazon.com/dp/B0CG1N9QRC


In [13]:
describe_system_prompt = '''
    You are a system generating descriptions for furniture items, decorative items, or furnishings on an e-commerce website.
    Provided with an image and a title, you will describe the main item that you see in the image, giving details but staying concise.
    You can describe unambiguously what the item is and its material, color, and style if clearly identifiable.
    If there are multiple items depicted, refer to the title to understand which item you should describe.
    '''

In [14]:
def describe_image(img_url, title):
    response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    temperature=0.2,
    messages=[
        {
            "role": "system",
            "content": describe_system_prompt
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": img_url,
                },
            ],
        },
        {
            "role": "user",
            "content": title
        }
    ],
    max_tokens=300,
    )

    return response.choices[0].message.content

In [15]:
for index, row in examples.iterrows():
    print(f"{row['title'][:50]}{'...' if len(row['title']) > 50 else ''} - {row['url']} :\n")
    img_description = describe_image(row['primary_image'], row['title'])
    img = Image(url=row['primary_image'])
    display(img)
    print(f"{img_description}\n--------------------------\n")

GOYMFK 1pc Free Standing Shoe Rack, Multi-layer Me... - https://www.amazon.com/dp/B0CJHKVG6P :



This is a multi-functional free-standing shoe rack featuring a sleek metal construction with a white finish. It is designed with multiple layers to accommodate a variety of shoes, showcasing at least four tiers dedicated to shoe storage. Additionally, the rack includes a vertical section with 8 double hooks, suitable for hanging items such as hats, scarves, and bags. The overall design is clean and modern, making it a versatile piece for organizing footwear and accessories in a living room, bathroom, or hallway. The rack's open design allows for easy visibility and access to the stored items.
--------------------------

subrtex Leather ding Room, Dining Chairs Set of 2,... - https://www.amazon.com/dp/B0B66QHB23 :



This image features a set of two black dining chairs. The chairs are upholstered in a leather-like material, providing a sleek and sophisticated appearance. They have a contemporary design, with clean lines and minimal embellishments, which includes subtle stitching details on the backrest for added texture and interest. The chairs are supported by four straight legs that match the color of the upholstery, creating a uniform and elegant look. These chairs would make a stylish addition to any modern dining room setting.
--------------------------

Plant Repotting Mat MUYETOL Waterproof Transplanti... - https://www.amazon.com/dp/B0BXRTWLYK :



This is a Plant Repotting Mat designed for indoor gardening tasks such as transplanting and repotting plants. The mat is square-shaped, measuring approximately 26.8 inches by 26.8 inches, providing ample space for gardening work. It is made from a waterproof material in a vibrant green color, which makes it easy to spot dirt and debris. The edges of the mat are raised with built-in corner tabs to keep soil and water contained, reducing mess during plant maintenance. The mat is foldable, making it portable and convenient to store when not in use. It is also easy to clean, which is ideal for repeated use. This mat can be a practical accessory for gardeners, serving as a protective surface for soil changing and other gardening activities, and it can also be a thoughtful gift for those who enjoy caring for plants.
--------------------------

Pickleball Doormat, Welcome Doormat Absorbent Non-... - https://www.amazon.com/dp/B0C1MRB2M8 :



This is a rectangular doormat featuring a playful design that caters to pickleball enthusiasts. The mat has a natural coir fiber construction, known for its durability and excellent scraping properties to remove dirt and debris from shoes. The background color is the natural brown of coir, and it features black text that reads "it's a good day to play PICKLEBALL," with the word "PICKLEBALL" in bold, capital letters for emphasis. Below the text, there is a graphic of two crossed pickleball paddles in black.

The doormat measures 16x24 inches, making it a suitable size for placement in front of a standard doorway. It is described as absorbent and non-slip, which suggests it has a backing material that prevents it from sliding around on the floor, enhancing safety. It is versatile enough to be used as a floor mat in a bathroom or other areas of a home where a small, absorbent mat is needed. The design adds a welcoming and personalized touch to any entryway, especially for those who have a

This image showcases a set of four foldable TV trays with a stand, designed for eating or as snack tray tables. The tables are presented in a sleek grey finish, which gives them a modern and versatile look that can easily blend with various interior decor styles. Each tray features a rectangular tabletop with a wood grain texture, supported by a sturdy iron frame with an X-shaped base that allows for easy folding and storage. The stand included in the set provides a convenient way to organize and store the trays when not in use, making them an ideal solution for small spaces where versatility and efficiency are key. The overall design is functional and space-saving, perfect for casual dining, working on a laptop, or as additional surface space for entertaining guests.
--------------------------

