# Preliminary Proposal
## Motivation
Since middle school, like other young children, I have spent a lot of time on Youtube watching various makeup videos from tutorials to reviews and everything in between. With the world of makeup being so expansive and widespread, makeup users only continue to diversify. In more recent years, there has been a huge demand for inclusive shade ranges from the palest shades to the deepest. Yet, many makeup brands still fail to keep up with this demand. As such, I want to do an analysis as to which brands have diversified their shade ranges as well as discover motivations behind these changes. In addition, I aim to research and collect data as to users' opinions and thoughts on these ranges. Feedback I received from my peers was to see how the popularity of these products might differ across countries which is something I have added to my project. 

## Research Questions
RQ1: How do the shade ranges differ across different countries' best sellers? (i.e. Does the best selling product in Nigeria have more dark shades and does Japan have more light shades?)

RQ2: For US best sellers, which products have the most inclusive shade ranges? 

RQ3: How do the shade ranges of these products from 2018 compare to their ranges now? 

## Methodology
RQ1: In my dataset, lightness of a shade has been extracted using Adobe Photoshop. I will categorize the shades of one product and compare the number of shades in each lightness category across the bestsellers from the US, Japan, Nigeria, and India. For this questions I will produce a table to compare those lightness values and attempt to visualize the shades alongside them.

RQ2: For this, I will use the lightness categories once more and compute an inclusivity score which will take into account the diversity of ranges as well as number of shades. Then, since I do not have data on specific sales, I will find the maximum number of reviews on the web for the US best sellers (could be on Sephora, Ulta, Target, etc.). Lastly I will run a t test to see if the relationship between the inclusivity score and number of reviews is statistically significant. 

RQ3: For this question I will have to conduct research outside of the dataset to find out how many shades there are now for each product. I feel that producing a bar chart to compare the number of shades from 2018 to now (2022) will be a good representation such that the difference in shades, or lack thereof will be evident.

## Related Work
An article titled Beauty Brawl on the Pudding which can be found [here] (https://pudding.cool/2018/06/makeup-shades/) uses the same dataset and does a similar analysis. They used Fenty Beauty's foundation range as a standard to compare other products to, I think this is because at the time, the product was brand new and their shade range was one of the largest. They also used a lightness scale which will be extremely helpful in comparing shades using some kind of metric. They have found that US bestsellers have the largest range compared to other countries and that Fenty was their winner. I want to be able to mimic the way in which this article visualizes the shades as they do a great job of showing the distinction between lightness levels. I will build off this study specifically with RQ3 as this data is from 2018 and I feel that showing how these ranges have changed in 4 years will give us insight as to how the makeup industry has evolved in terms of inclusivity. 

## Data
The dataset I have chosen to use comes from Kaggle titled "Makeup Shades Dataset". The dataset can be found [here](https://www.kaggle.com/datasets/shivamb/makeup-shades-dataset?datasetId=1735543). This dataset contains information on popular brands, their shade ranges using hex colors, US best sellers, brands with POC and non POC founders, and a few other countries' best sellers. Their license is from the creative commons and belongs to the public domain. This dataset is suitable because it contains many popular brands used in the makeup world today and their most popular products. This means it will be representative of what the general public has access to. Using the hex colors I hope to be able to come up with some data visualizations as well. My only concern is that this dataset is from 2018 which means it might not be as current with shade ranges but I should be able to do further research to be able to include what these ranges appear to be currently.

## Unknowns and Dependencies
I decided to switch my project to be less coding heavy. After our ethnography, I feel I'm much more suited to conducting research in that way and I'll be able to discover human motivations rather than relying on my somewhat shaky coding skills. I wish to do a lot of research on this topic in addition to working with this dataset which I fear may be more time consuming than I am able but I will definitely try my best!


# Findings
### RQ3: How do the shade ranges of these products from 2018 compare to their ranges now? 
![download.png](attachment:07b4d908-34b8-47a2-a02e-0d21c60dbd5f.png)
I compared the number of shades found in my dataset from 2018 to the number of shades I could find for that product available today. I created a visualization of the comparison between the two. I found that even though some of these brands boasted 30, 40 shades which is considered extremely broad in the makeup industry, some of those brands still expanded their shade range. For example MAC went from 42 shades to 63 and Fenty from 40 to 50. In choosing which brands to analyze, I decided to take one from each of the countries' best sellers list along with a BIPOC recommended brand (Black Opal). Nykka (Indian best seller) and Black Opal were among the many brands that kept their shade range the same. You'd think for a brand with only 5 or 12 shades, they'd update their range however that is not the case. Perhaps it is because the company has a targetted audience and they are already meeting their needs meaning there's no need for expansion. You can see these targetted ranges in the images of Nykka's and Black Opal's shade offerings. 
![black_opal.PNG](attachment:eaa0544c-c683-4377-a9d9-08d82ffca397.PNG)

**Black Opal's TRUE COLOR Foundation**

![nykka.PNG](attachment:b76e755b-7362-4a88-91d2-09f32bd2862d.PNG)

**Nykka's SKINgenius Foundation**

The following block of code is used to create a dictionary of the shade range counts of different foundation products from 2018. With external research, I found their current shade ranges and created a bar chart to visualize the difference

In [29]:
import csv
import pandas as pd
df = pd.read_csv('shades.csv')

# Creating a dictionary for RQ3 to find out initial shade ranges
shade_count = {}

for index, row in df.iterrows():
    brand = row['brand']
    if brand not in shade_count:
        shade_count[brand] = 1
    else:
        shade_count[brand] += 1
print(shade_count)

{'Maybelline': 54, 'bareMinerals': 29, 'Estée Lauder': 42, 'Revlon': 22, "L'Oréal": 36, 'Covergirl + Olay': 12, 'Fenty': 40, 'Iman': 8, 'Beauty Bakerie': 30, 'Black Up': 18, 'Black Opal': 12, 'Laws of Nature': 17, 'Lancôme': 40, 'MAC': 42, 'Bobbi Brown': 30, 'Make Up For Ever': 40, 'Hegai and Ester': 10, 'House of Tara': 11, 'Trim & Prissy': 13, 'Elsas Pro': 11, 'Kuddy': 5, 'RMK': 9, 'Addiction': 17, 'Shu Uemera': 11, 'Shiseido': 6, 'Kate': 6, 'IPSA': 6, 'Dior': 6, 'NARS': 13, 'Lakmé': 4, 'Colorbar': 3, 'Bharat & Doris': 7, 'Olivia': 4, 'Blue Heaven': 2, 'Lotus Herbals': 4, 'Nykaa': 5}
minimum: 11
maximum: 95


In [86]:
# Creating a list of all possible lightness values 
list_lightness = []
for index, row in df.iterrows():
    lightness = row['L']
    list_lightness.append(lightness)
    
# Taking min and max of lightness values
# Range from 10-100, will split into 5 with 10-28, 29-46, 47-64, 65-82, 83-100
print('minimum: ' + str(min(list_lightness)))
print('maximum: ' + str(max(list_lightness)))

# Will select Fenty (America), Maybelline (America), Hegai and Ester (Nigeria), House of Tara (Nigeria), Nars (Japan), Bharat and Dorris (India)
# Creating dataframe with each mask of brand
fenty = df[(df['product_short'] == 'pf')]
maybelline = df[(df['product_short'] == 'fmf')]
hegai = df[(df['product_short'] == 'pp')]
# house = df[(df['product_short'] == 'off')]
nars = df[(df['product_short'] == 'vm')]
bharat = df[(df['brand_short'] == 'bd')]
#print(df[fenty] & df[maybelline])
#print(df[(df['product_short'] == 'pf') | ~(df['product_short'] == 'fmf')])


# Function that takes in a dataframe, and adds it to appropriate key in dict according to lightness value
def lightness_values (dataframe):
# Create dictionary with different ranges as keys and 0 as values
    lightness_dict = {'10-28': 0, '29-46': 0, '47-64': 0, '65-82': 0, '83-100': 0}
    for lightness in dataframe['L']:
        if lightness >= 10 and lightness <= 28:
            lightness_dict['10-28'] += 1
        elif lightness >= 29 and lightness <= 46:
            lightness_dict['29-46'] += 1
        elif lightness >= 47 and lightness <= 64:
            lightness_dict['47-64'] += 1
        elif lightness >= 65 and lightness <= 82:
            lightness_dict['65-82'] += 1
        else:
            lightness_dict['83-100'] += 1
    print(lightness_dict)

# Running function on brand data frames
print('fenty')
lightness_values(fenty)
print('maybelline')
lightness_values(maybelline)
print('hegai')
lightness_values(hegai)
# print('house')
# lightness_values(house)
print('nars')
lightness_values(nars)
print('bharat')
lightness_values(bharat)


minimum: 11
maximum: 95
               brand brand_short        product product_short     hex     H  \
444  Hegai and Ester          he  Photo Perfect            pp  f6a762  28.0   
445  Hegai and Ester          he  Photo Perfect            pp  d76b4c  13.0   
446  Hegai and Ester          he  Photo Perfect            pp  ad543d  12.0   
447  Hegai and Ester          he  Photo Perfect            pp  c86835  21.0   
448  Hegai and Ester          he  Photo Perfect            pp  904134   8.0   
449  Hegai and Ester          he  Photo Perfect            pp  92524e   4.0   
450  Hegai and Ester          he  Photo Perfect            pp  bf6b44  19.0   
451  Hegai and Ester          he  Photo Perfect            pp  915831  24.0   
452  Hegai and Ester          he  Photo Perfect            pp  ea8944  25.0   
453  Hegai and Ester          he  Photo Perfect            pp  eebb7a  34.0   

        S     V   L  group  
444  0.60  0.96  75      5  
445  0.65  0.84  58      5  
446  0.65  0.68  47