### Dataset Creation

This dataset was designed with the objective of studying 3 common silhouettes of women's tops: triangle/trapeze, hourglass, and rectangle. A garment's silhouette is the shape that it makes when worn on the body. Clothing silhouettes are essential to personal style as different silhouettes flatter different body types. Silhouettes are created by the cut of the garment and the material used to construct the garment, but also by the physique of the individual wearing the garment.

As a result, it is often difficult for consumers browsing online second-hand clothing markets to identify the styles and silhouettes of available items unless explicitly stated or modeled by sellers.   

This dataset consists of 1400 35x35 images of clothing items belonging to 4 classes: babydoll tops, bustier/corsets, shorts, and t-shirts. Each class contains 350 samples collected on the image gallery site Pinterest and bulk downloaded using *gallery-dl*. In an effort to reduce noise, *rembg* was used to remove the background from each image. Class labels were created and images were additionally transformed to grayscale and resized to 35x35 pixels using preprocessing from *Tensorflow*. The resulting dataset is a numpy pixel array with the shape (1400, (35,35)) with the corresponding labels of shape (1400, ).

The particular clothing silhouettes of focus in this dataset are babydoll, hourglass, and straight, represented by the image classes babydoll tops, bustier/corsets, and t-shirts. The image class for shorts was introduced to add complexity to the dataset.

In [1]:
import cv2

In [2]:
from rembg import remove 

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


In [3]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

2023-05-01 15:28:46.979424: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [9]:
import os

for file in os.listdir('/Users/irisyu/My Python/604finalproject/images/babydoll_top'):
    if file != '.DS_Store':
        print(f'Processing {file}')
        with open(f'/Users/irisyu/My Python/604finalproject/images/babydoll_top/{file}', 'rb') as i:
            with open(f'/Users/irisyu/My Python/604finalproject/rmbg_imgs/babydoll_top/{file}', 'wb') as o:
                input = i.read()
                output = remove(input)
                o.write(output)

Processing pinterest_699324648411120362.jpg
Processing pinterest_699324648411121068.jpg
Processing pinterest_699324648411193427.png
Processing pinterest_699324648411162179.jpg
Processing pinterest_699324648411146018.jpg
Processing pinterest_699324648411121083.jpg
Processing pinterest_699324648411120389.jpg
Processing pinterest_699324648411162421.jpg
Processing pinterest_699324648411193396.jpg
Processing pinterest_699324648411146391.jpg
Processing pinterest_699324648411162227.jpg
Processing pinterest_699324648411146385.jpg
Processing pinterest_699324648411145868.jpg
Processing pinterest_699324648411146408.jpg
Processing pinterest_699324648411146622.jpg
Processing pinterest_699324648411162795.jpg
Processing pinterest_699324648411120203.jpg
Processing pinterest_699324648411121109.jpg
Processing pinterest_699324648411146435.jpg
Processing pinterest_699324648411120001.jpg
Processing pinterest_699324648411146384.jpg
Processing pinterest_699324648411120015.jpg
Processing pinterest_69932464841

In [10]:
for file in os.listdir('/Users/irisyu/My Python/604finalproject/images/bustier_corset'):
    if file != '.DS_Store':
        print(f'Processing {file}')
        with open(f'/Users/irisyu/My Python/604finalproject/images/bustier_corset/{file}', 'rb') as i:
            with open(f'/Users/irisyu/My Python/604finalproject/rmbg_imgs/bustier_corset/{file}', 'wb') as o:
                input = i.read()
                output = remove(input)
                o.write(output)

Processing pinterest_699324648411119285.jpg
Processing pinterest_699324648411162637.jpg
Processing pinterest_699324648411167798.jpg
Processing pinterest_699324648411162838.jpg
Processing pinterest_699324648411162810.jpg
Processing pinterest_699324648411167940.png
Processing pinterest_699324648411162347.jpg
Processing pinterest_699324648411149471.png
Processing pinterest_699324648411169399.jpg
Processing pinterest_699324648411167808.jpg
Processing pinterest_699324648411146804.jpg
Processing pinterest_699324648411162597.jpg
Processing pinterest_699324648411163064.jpg
Processing pinterest_699324648411168125.jpg
Processing pinterest_699324648411163070.jpg
Processing pinterest_699324648411163058.jpg
Processing pinterest_699324648411149464.jpg
Processing pinterest_699324648411121055.jpg
Processing pinterest_699324648411119290.jpg
Processing pinterest_699324648411167996.jpg
Processing pinterest_699324648411119286.jpg
Processing pinterest_699324648411121057.jpg
Processing pinterest_69932464841

In [11]:
for file in os.listdir('/Users/irisyu/My Python/604finalproject/images/shorts'):
    if file != '.DS_Store':
        print(f'Processing {file}')
        with open(f'/Users/irisyu/My Python/604finalproject/images/shorts/{file}', 'rb') as i:
            with open(f'/Users/irisyu/My Python/604finalproject/rmbg_imgs/shorts/{file}', 'wb') as o:
                input = i.read()
                output = remove(input)
                o.write(output)

Processing pinterest_699324648411209012.jpg
Processing pinterest_699324648411208481.jpg
Processing pinterest_699324648411208654.jpg
Processing pinterest_699324648411208697.jpg
Processing pinterest_699324648411208873.jpg
Processing pinterest_699324648411208708.jpg
Processing pinterest_699324648411208085.jpg
Processing pinterest_699324648411208907.jpg
Processing pinterest_699324648411208522.jpg
Processing pinterest_699324648411209172.jpg
Processing pinterest_699324648411208287.jpg
Processing pinterest_699324648411210609.png
Processing pinterest_699324648411210387.jpg
Processing pinterest_699324648411208537.jpg
Processing pinterest_699324648411208912.jpg
Processing pinterest_699324648411208090.jpg
Processing pinterest_699324648411208084.jpg
Processing pinterest_699324648411208721.jpg
Processing pinterest_699324648411208866.jpg
Processing pinterest_699324648411208127.jpg
Processing pinterest_699324648411209007.jpg
Processing pinterest_699324648411208331.jpg
Processing pinterest_69932464841

In [57]:
for file in os.listdir('/Users/irisyu/My Python/604finalproject/images/last'):
    if file != '.DS_Store':
        print(f'Processing {file}')
        with open(f'/Users/irisyu/My Python/604finalproject/images/last/{file}', 'rb') as i:
            with open(f'/Users/irisyu/My Python/604finalproject/images/processed/{file}', 'wb') as o:
                input = i.read()
                output = remove(input)
                o.write(output)

Processing pinterest_699324648411255465.jpg
Processing pinterest_699324648411255467.jpg


In [48]:
for file in os.listdir('/Users/irisyu/My Python/604finalproject/images/extratees'):
    if file != '.DS_Store':
        print(f'Processing {file}')
        with open(f'/Users/irisyu/My Python/604finalproject/images/extratees/{file}', 'rb') as i:
            with open(f'/Users/irisyu/My Python/604finalproject/images/processed/{file}', 'wb') as o:
                input = i.read()
                output = remove(input)
                o.write(output)

Processing pinterest_699324648411254895.jpg
Processing pinterest_699324648411254671.jpg
Processing pinterest_699324648411254856.jpg
Processing pinterest_699324648411254506.jpg
Processing pinterest_699324648411254937.jpg
Processing pinterest_699324648411254843.jpg
Processing pinterest_699324648411254464.png
Processing pinterest_699324648411254909.jpg
Processing pinterest_699324648411254505.jpg
Processing pinterest_699324648411254511.png
Processing pinterest_699324648411254539.png
Processing pinterest_699324648411254934.jpg
Processing pinterest_699324648411254920.jpg
Processing pinterest_699324648411254868.jpg
Processing pinterest_699324648411254883.jpg
Processing pinterest_699324648411254475.jpg
Processing pinterest_699324648411255019.jpg
Processing pinterest_699324648411254850.png
Processing pinterest_699324648411254887.jpg
Processing pinterest_699324648411254717.jpg
Processing pinterest_699324648411254515.jpg
Processing pinterest_699324648411254528.jpg
Processing pinterest_69932464841

In [59]:
# use tf preprocessing to preprocess downloaded images 

dataset = tf.keras.preprocessing.image_dataset_from_directory('/Users/irisyu/My Python/604finalproject/rmbg_imgs', labels="inferred",
    label_mode="int",
    class_names=['babydoll_trapeze','bustier_hourglass','tshirt_straight'],
    color_mode="grayscale",
    batch_size=35,
    image_size=(35, 35),
    shuffle=False,
    seed=None,
    validation_split=None,
    subset=None,
    interpolation="bilinear",
    follow_links=False,
    crop_to_aspect_ratio=False
)

Found 1305 files belonging to 3 classes.


In [60]:
# unbatch and transform to np.array to use outside of tensorflow
dataset = dataset.unbatch()

images = np.asarray(list(dataset.map(lambda x, y:x)))
labels = np.asarray(list(dataset.map(lambda x, y:y)))

In [61]:
np.unique(labels,return_counts=True)

(array([0, 1, 2], dtype=int32), array([435, 435, 435]))

In [65]:
# flatten images
img = images.reshape(-1,1225)
print(img.shape)

(1305, 1225)


In [66]:
# global mean centering 

mean = img.mean()
print('Mean: %.3f' % mean)
print('Min: %.3f, Max: %.3f' % (img.min(), img.max()))
# global centering of pixels
img = img - mean
# confirm it had the desired effect
mean = img.mean()
print('Mean: %.3f' % mean)
print('Min: %.3f, Max: %.3f' % (img.min(), img.max()))

Mean: 57.568
Min: 0.000, Max: 255.000
Mean: 0.000
Min: -57.568, Max: 197.432


In [67]:
# normalize pixels
img = img/255.0

In [68]:
from numpy import save

# save to npy file
save('images.npy', img)

save('labels.npy', labels)