# Shades of Makeup
### Lauren Nguyen
## Introduction
Since middle school, like other young children, I have spent a lot of time on Youtube watching various makeup videos from tutorials to reviews and everything in between. With the world of makeup being so expansive and widespread, makeup users only continue to diversify. In more recent years, there has been a huge demand for inclusive shade ranges from the palest shades to the deepest. Yet, many makeup brands still fail to keep up with this demand. As such, I want to do an analysis as to which brands have diversified their shade ranges as well as discover motivations behind these changes. In addition, I aim to research and collect data as to users' opinions and thoughts on these ranges. Feedback I received from my peers was to see how the popularity of these products might differ across countries which is something I have added to my project. 

## Research Questions
RQ1: How do the shade ranges differ across different countries' best sellers? (i.e. Does the best selling product in Nigeria have more dark shades and does Japan have more light shades?)

RQ2: For US best sellers, which products have the most inclusive shade ranges? 

RQ3: How do the shade ranges of these products from 2018 compare to their ranges now? 

## Methodology
RQ1: In my dataset, lightness of a shade has been extracted using Adobe Photoshop. I will categorize the shades of one product and compare the number of shades in each lightness category across the bestsellers from the US, Japan, Nigeria, and India. For this questions I will produce a stacked bar chart to compare those lightness categories and visualize the shades alongside them.

RQ2: For this, I will use the lightness categories once more and compute an inclusivity score for each American best seller which will take into account the diversity of ranges as well as number of shades. With those inclusivity scores I created a table to easily compare the scores.

RQ3: For this question I will have to conduct research outside of the dataset to find out how many shades there are now for each product. I feel that producing a bar chart to compare the number of shades from 2018 to now (2022) will be a good representation such that the difference in shades, or lack thereof will be evident.

## Related Work
An article titled Beauty Brawl on the Pudding which can be found [here] (https://pudding.cool/2018/06/makeup-shades/) uses the same dataset and does a similar analysis. They used Fenty Beauty's foundation range as a standard to compare other products to, I think this is because at the time, the product was brand new and their shade range was one of the largest. They also used a lightness scale which will be extremely helpful in comparing shades using some kind of metric. They have found that US bestsellers have the largest range compared to other countries and that Fenty was their winner. I want to be able to mimic the way in which this article visualizes the shades as they do a great job of showing the distinction between lightness levels. I will build off this study specifically with RQ3 as this data is from 2018 and I feel that showing how these ranges have changed in 4 years will give us insight as to how the makeup industry has evolved in terms of inclusivity. 

## Data
The dataset I have chosen to use comes from Kaggle titled "Makeup Shades Dataset". The dataset can be found [here](https://www.kaggle.com/datasets/shivamb/makeup-shades-dataset?datasetId=1735543). This dataset contains information on popular brands, their shade ranges using hex colors, US best sellers, brands with POC and non POC founders, and a few other countries' best sellers. Their license is from the creative commons and belongs to the public domain. This dataset is suitable because it contains many popular brands used in the makeup world today and their most popular products. This means it will be representative of what the general public has access to. Using the hex colors I hope to be able to come up with some data visualizations as well. My only concern is that this dataset is from 2018 which means it might not be as current with shade ranges but I should be able to do further research to be able to include what these ranges appear to be currently.


# Findings

## RQ1: How do the shade ranges differ across different countries' best sellers?

The following block of code will compute the minimum and maximum lightness values, create new dataframes for best sellers from each country, define a function that sorts the shades of a foundation product into lightness categories, and finally calls the function for each dataframe of the best selling products

In [46]:
import csv
import pandas as pd
df = pd.read_csv('shades.csv')

In [47]:
# Creating a list of all possible lightness values 
list_lightness = []
for index, row in df.iterrows():
    lightness = row['L']
    list_lightness.append(lightness)
    
# Taking min and max of lightness values
# Range from 10-100, will split into 5 with 10-28, 29-46, 47-64, 65-82, 83-100
print('minimum: ' + str(min(list_lightness)))
print('maximum: ' + str(max(list_lightness)))

# Will select Fenty (America), Maybelline (America), Hegai and Ester (Nigeria), House of Tara (Nigeria), Nars (Japan), Bharat and Dorris (India)
# Creating dataframe with each mask of brand
fenty = df[(df['product_short'] == 'pf')]
maybelline = df[(df['product_short'] == 'fmf')]
hegai = df[(df['product_short'] == 'pp')]
nars = df[(df['product_short'] == 'vm')]
bharat = df[(df['brand_short'] == 'bd')]

# Function that takes in a dataframe, and adds every shade to the appropriate key representing lightness categories in dictonary according to lightness value
def lightness_values (dataframe):
# Create dictionary with different ranges as keys and 0 as values
    lightness_dict = {'10-28': 0, '29-46': 0, '47-64': 0, '65-82': 0, '83-100': 0}
    for lightness in dataframe['L']:
        if lightness >= 10 and lightness <= 28:
            lightness_dict['10-28'] += 1
        elif lightness >= 29 and lightness <= 46:
            lightness_dict['29-46'] += 1
        elif lightness >= 47 and lightness <= 64:
            lightness_dict['47-64'] += 1
        elif lightness >= 65 and lightness <= 82:
            lightness_dict['65-82'] += 1
        else:
            lightness_dict['83-100'] += 1
    return lightness_dict

# Running function on brand data frames
print('fenty')
print(lightness_values(fenty))
print('maybelline')
print(lightness_values(maybelline))
print('hegai')
print(lightness_values(hegai))
print('nars')
print(lightness_values(nars))
print('bharat')
print(lightness_values(bharat))

# Lightness pickers from middle of shade ranges and their respective hex colors
19, 38, 56, 74, 92
# 4D2006
# 7B4E34
# B27949
# DAAD90
# F9E3D5

minimum: 11
maximum: 95
fenty
{'10-28': 1, '29-46': 7, '47-64': 8, '65-82': 16, '83-100': 8}
maybelline
{'10-28': 2, '29-46': 9, '47-64': 7, '65-82': 16, '83-100': 6}
hegai
{'10-28': 0, '29-46': 3, '47-64': 4, '65-82': 3, '83-100': 0}
nars
{'10-28': 0, '29-46': 0, '47-64': 2, '65-82': 9, '83-100': 2}
bharat
{'10-28': 0, '29-46': 0, '47-64': 0, '65-82': 6, '83-100': 1}


(19, 38, 56, 74, 92)

To answer this question, I first used the lightness factor extracted from each hex shade and categorized those lightness ranges into 5 categories: deep (11-28), dark (29-46), tan (47-64), medium (65-82), light (83-95). I then used a hex color selector to visualize each of these ranges which can be seen in the graphic below.
![shade palette.PNG](attachment:0d948ab5-4c8c-4721-a43f-0074912819d7.PNG)

Building off those ranges, I took the lightness factor in the middle of each range and produced a hex color from those 5 shades in order to create a bar graph with bars that represent each of the 5 categories. Below is a visualization of the best sellers from various countries and the distribution of their shade ranges amongst the different lightness categories.

![lightness bar chart.png](attachment:789a1905-398e-4e1a-97a1-af5a7625f67e.png)

From this visualization you can see that the majority of shades across all brands except Hegai and Ester (Nigeria) belong in the medium category. To add on, across the board, the lightness category with the least amount of shades was the deep category followed by the dark category which is inequitable and unfortunately, to be expected. When comparing across the different countries, American top selling brands (Fenty & Maybelline) have the largest amounts of shades and are the only 2 out of the 5 brands to have shades in the deep category. The non American best sellers, Hegai and Ester (Nigeria), Nars (Japan), and Bharat and Dorris (India) have much more targetted ranges with all of their shades falling under 2 or 3 lightness categories. The Nigerian best seller covers the 3 *medium* categories whereas the Japanese best sellers ranges across the 3 *lighter* categories, and lastly the Indian best seller has almost every shade in the medium category. 

From these findings, I'd say that these makeup shade ranges reflect not only the shade diversity amongst the country but also social ideals. What I mean by that is in today's makeup world, a product could not become a best seller in America if it had a limited shade range. This is demonstrated by the fact that almost every YouTube video revolving foundation and concealer product reviews mention shade ranges and their diversity, oftentimes products being praised for their extensive ranges or criticized for the lack thereof. In addition, while every country has a wide range of skin tones, amongst the four countries, America has the most skin tone diversity which can be reflected in their ranges. 

## RQ2: For US best sellers, which products have the most inclusive shade ranges? 

### Inclusivity Score
#### The inclusivity score will take into account the number of shades a product has and how diverse those shades are across the 5 lightness categories. The inclusivity score will be out of 100 points. 
### How to calculate inclusivity score
- Products will get a point for the number of shades they have and capped at 50 points
- 10 points possible from each shade range
- If at least 1/5 of their shade falls into one category they get 10 points 
- If less than 1/5 of their shades falls into a category, they receive less points
Example 1: if a brand has 50 shades and their 50 shades were distributed equally amongst lightness categories, they would receive an inclusivity score of 100
Example 2: if a brand has 30 shades and 2/5 of those shades were in medium, 1/5 in tan, 1/5 in dark, 1/10 in light, and 1/10 in deep, they would receive an inclusivity score of 70

**Equation:** Number of shades + 50(num shades in lightness category / total shades)\*

\*Second part of equation repeated 5 times across each lightness category



The following block of code defines a function that calculates an inclusivty score according to the criteria listed above. It then creates new dataframes using masks of each of the US best sellers. Finally, it creates a new data frame that includes the name of brand and it's calculated inclusivity score and visualizes it into a table. 

In [40]:
def calc_inclusivity_score(brand):
    total_shades = len(brand)
    if total_shades > 50:
        total_shades = 50
    # print("total shades: " + str(total_shades))
    brand_dict = lightness_values(brand)
    category_scores = []
    for val in brand_dict.values():
        category_score = 50 * (val/total_shades)
        if category_score > 10:
            category_score = 10
        category_scores.append(category_score)
        # print("cat score: " + str(category_score))
    total = 0
    for score in range(0, len(category_scores)):
        total += category_scores[score]
    inclusivity_score = total_shades + total
    return(inclusivity_score)

bareminerals = df[(df['product_short'] == 'pro')]
estee = df[(df['product_short'] == 'dw')]
revlon = df[(df['product_short'] == 'cs')]
loreal = df[(df['product_short'] == 'ipm')]

data = [['Fenty', calc_inclusivity_score(fenty)], ['Maybelline', calc_inclusivity_score(maybelline)], ['Bare Minerals', calc_inclusivity_score(bareminerals)], ['Estee Lauder', calc_inclusivity_score(estee)], ['Revlon', calc_inclusivity_score(revlon)], ['Loreal', calc_inclusivity_score(loreal)]]
inclusivity_df = pd.DataFrame(data, columns = ['Brand', 'Score'])
display(inclusivity_df)

Unnamed: 0,Brand,Score
0,Fenty,80.0
1,Maybelline,78.75
2,Bare Minerals,64.517241
3,Estee Lauder,77.47619
4,Revlon,46.545455
5,Loreal,57.0


The products amongst US best sellers that have the most inclusive shade ranges are the Fenty PRO FILT'R foundation and the Maybelline Fit Me foundation. By inclusive not only do these products have more shades available but the shades also span across all lightness categories. The Fenty foundation is sold at 38 dollars and the Maybelline foundation is around 8 dollars so, there are good options across multiple price points. Following the top two are Estee Lauder, Bare Minerals, Loreal, and finally Revlon. 

### RQ3: How do the shade ranges of these products from 2018 compare to their ranges now? 

The following block of code is used to create a dictionary of the shade range counts of different foundation products from 2018. With external research, I found their current shade ranges and created a bar chart to visualize the difference

In [48]:
# Creating a dictionary for RQ3 to find out initial shade ranges
shade_count = {}

for index, row in df.iterrows():
    brand = row['brand']
    if brand not in shade_count:
        shade_count[brand] = 1
    else:
        shade_count[brand] += 1
print(shade_count)

{'Maybelline': 54, 'bareMinerals': 29, 'Estée Lauder': 42, 'Revlon': 22, "L'Oréal": 36, 'Covergirl + Olay': 12, 'Fenty': 40, 'Iman': 8, 'Beauty Bakerie': 30, 'Black Up': 18, 'Black Opal': 12, 'Laws of Nature': 17, 'Lancôme': 40, 'MAC': 42, 'Bobbi Brown': 30, 'Make Up For Ever': 40, 'Hegai and Ester': 10, 'House of Tara': 11, 'Trim & Prissy': 13, 'Elsas Pro': 11, 'Kuddy': 5, 'RMK': 9, 'Addiction': 17, 'Shu Uemera': 11, 'Shiseido': 6, 'Kate': 6, 'IPSA': 6, 'Dior': 6, 'NARS': 13, 'Lakmé': 4, 'Colorbar': 3, 'Bharat & Doris': 7, 'Olivia': 4, 'Blue Heaven': 2, 'Lotus Herbals': 4, 'Nykaa': 5}


![download.png](attachment:07b4d908-34b8-47a2-a02e-0d21c60dbd5f.png)

I compared the number of shades found in my dataset from 2018 to the number of shades I could find for that product available today. I found that even though some of these brands boasted 30, 40 shades which is considered extremely broad in the makeup industry, some of those brands still expanded their shade range. For example MAC went from 42 shades to 63 and Fenty from 40 to 50. In choosing which brands to analyze, I decided to take one from each of the countries' best sellers list along with a BIPOC recommended brand (Black Opal). Nykka (Indian best seller) and Black Opal were among the many brands that kept their shade range the same. You'd think for a brand with only 5 or 12 shades, they'd update their range however that is not the case. Perhaps it is because the company has a targetted audience and they are already meeting their needs meaning there's no need for expansion. You can see these targetted ranges in the images of Nykka's and Black Opal's shade offerings. 

![black_opal.PNG](attachment:eaa0544c-c683-4377-a9d9-08d82ffca397.PNG)

**Black Opal's TRUE COLOR Foundation**

![nykka.PNG](attachment:b76e755b-7362-4a88-91d2-09f32bd2862d.PNG)

**Nykka's SKINgenius Foundation**

## Summary
The most astounding limitation of this project is that the dataset is from 2018 which means it may not be as representative of what the best sellers are today and their shade ranges also may have changed. I accounted for this with research question 3 to compare products to today. 

To conclude, makeup companies still have much work to do in creating more inclusive shade ranges for their foundation products. Especially foreign countries where their shade ranges are very targetted and exclusive. While American ranges are wider and have more variety, they are still lacking in additional deeper shades. However, to give credit to Fenty and MAC, while both brands had large ranges in 2018, they continuously increased their range since then which is something all brands can take after.