![trainers in a store](trainers.jpg)

Sports clothing and athleisure attire is a huge industry, worth approximately [$193 billion in 2021](https://www.statista.com/statistics/254489/total-revenue-of-the-global-sports-apparel-market/) with a strong growth forecast over the next decade! 

In this notebook, you will undertake the role of a product analyst for an online sports clothing company. The company is specifically interested in how it can improve revenue. You will dive into product data such as pricing, reviews, descriptions, and ratings, as well as revenue and website traffic, to produce recommendations for its marketing and sales teams.  

You've been provided with four datasets to investigate:

#  brands.csv

| Columns | Description |
|---------|-------------|
| `product_id` | Unique product identifier |
| `brand` | Brand of the product | 

# finance.csv

| Columns | Description |
|---------|-------------|
| `product_id` | Unique product identifier |
| `listing_price` | Original price of the product | 
| `sale_price` | Discounted price of the product |
| `discount` | Discount off the listing price, as a decimal | 
| `revenue` | Revenue generated by the product |

# info.csv

| Columns | Description |
|---------|-------------|
| `product_name` | Name of the product | 
| `product_id` | Unique product identifier |
| `description` | Description of the product |

# reviews.csv

| Columns | Description |
|---------|-------------|
| `product_id` | Unique product identifier |
| `rating` | Average product rating | 
| `reviews` | Number of reviews for the product |

In [8]:
import pandas as pd
import numpy as np

brands = pd.read_csv("Data/brands.csv") 
finance = pd.read_csv("Data/finance.csv")
info = pd.read_csv("Data/info.csv")
reviews = pd.read_csv("Data/reviews.csv")

In [9]:
brands.head()

Unnamed: 0,product_id,brand
0,AH2430,
1,G27341,Adidas
2,CM0081,Adidas
3,B44832,Adidas
4,D98205,Adidas


In [10]:
finance.head()

Unnamed: 0,product_id,listing_price,sale_price,discount,revenue
0,AH2430,,,,
1,G27341,75.99,37.99,0.5,1641.17
2,CM0081,9.99,5.99,0.4,398.93
3,B44832,69.99,34.99,0.5,2204.37
4,D98205,79.99,39.99,0.5,5182.7


In [11]:
info.head()

Unnamed: 0,product_name,product_id,description
0,,AH2430,
1,Women's adidas Originals Sleek Shoes,G27341,"A modern take on adidas sport heritage, tailor..."
2,Women's adidas Swim Puka Slippers,CM0081,These adidas Puka slippers for women's come wi...
3,Women's adidas Sport Inspired Questar Ride Shoes,B44832,"Inspired by modern tech runners, these women's..."
4,Women's adidas Originals Taekwondo Shoes,D98205,This design is inspired by vintage Taekwondo s...


In [12]:
reviews.head()

Unnamed: 0,product_id,rating,reviews
0,AH2430,,
1,G27341,3.3,24.0
2,CM0081,2.6,37.0
3,B44832,4.1,35.0
4,D98205,3.5,72.0


In [13]:
brands_finance = pd.merge(left = brands, right = finance, on = "product_id", how = "outer").dropna()
brands_finance.head()

Unnamed: 0,product_id,brand,listing_price,sale_price,discount,revenue
0,130690-017,Nike,0.0,159.95,0.0,6909.84
1,133000-106,Nike,0.0,119.95,0.0,0.0
2,280648,Adidas,29.99,29.99,0.0,2915.03
3,288022,Adidas,29.99,29.99,0.0,5128.29
4,310805-137,Nike,0.0,159.95,0.0,64203.93


In [14]:
adidas_nike = brands_finance[(brands_finance["brand"] == "Adidas") | (brands_finance["brand"] == "Nike")]
adidas_nike.head()

Unnamed: 0,product_id,brand,listing_price,sale_price,discount,revenue
0,130690-017,Nike,0.0,159.95,0.0,6909.84
1,133000-106,Nike,0.0,119.95,0.0,0.0
2,280648,Adidas,29.99,29.99,0.0,2915.03
3,288022,Adidas,29.99,29.99,0.0,5128.29
4,310805-137,Nike,0.0,159.95,0.0,64203.93


In [23]:
adidas_nike.groupby("brand").agg({"revenue" : "mean", "product_id" : "count"}).round(3)

Unnamed: 0_level_0,revenue,product_id
brand,Unnamed: 1_level_1,Unnamed: 2_level_1
Adidas,4476.357,2575
Nike,1472.079,545


In [16]:
twenty_fifth = adidas_nike["listing_price"].quantile(0.25)
median = adidas_nike["listing_price"].quantile(0.5)
seventy_fifth = adidas_nike["listing_price"].quantile(0.75)
maximum = adidas_nike["listing_price"].max()

labels = ["Budget", "Average", "Expensive", "Elite"]
bins = [0, twenty_fifth, median, seventy_fifth, maximum]

adidas_nike["price_label"] = pd.cut(adidas_nike["listing_price"], bins = bins, labels = labels)
adidas_nike.head()

Unnamed: 0,product_id,brand,listing_price,sale_price,discount,revenue,price_label
0,130690-017,Nike,0.0,159.95,0.0,6909.84,
1,133000-106,Nike,0.0,119.95,0.0,0.0,
2,280648,Adidas,29.99,29.99,0.0,2915.03,Budget
3,288022,Adidas,29.99,29.99,0.0,5128.29,Budget
4,310805-137,Nike,0.0,159.95,0.0,64203.93,


In [17]:
adidas_vs_nike = adidas_nike.groupby(["brand", "price_label"]).agg({"product_id" : "count", "revenue" : "mean"}).reset_index()
adidas_vs_nike = adidas_vs_nike.rename(columns = {"product_id" : "num_products" , "revenue" : "mean_revenue"})
adidas_vs_nike["mean_revenue"] = adidas_vs_nike["mean_revenue"].round(2)
adidas_vs_nike = pd.DataFrame(adidas_vs_nike)
adidas_vs_nike

  adidas_vs_nike = adidas_nike.groupby(["brand", "price_label"]).agg({"product_id" : "count", "revenue" : "mean"}).reset_index()


Unnamed: 0,brand,price_label,num_products,mean_revenue
0,Adidas,Budget,574,2015.68
1,Adidas,Average,655,3035.3
2,Adidas,Expensive,759,4621.56
3,Adidas,Elite,587,8302.78
4,Nike,Budget,6,97.99
5,Nike,Average,8,675.59
6,Nike,Expensive,47,500.56
7,Nike,Elite,130,1367.45


In [18]:
info_reviews = pd.merge(left = info, right = reviews, on = "product_id").dropna()
info_reviews["description_chars"] = info_reviews["description"].str.len()
info_reviews.head()

Unnamed: 0,product_name,product_id,description,rating,reviews,description_chars
1,Women's adidas Originals Sleek Shoes,G27341,"A modern take on adidas sport heritage, tailor...",3.3,24.0,175
2,Women's adidas Swim Puka Slippers,CM0081,These adidas Puka slippers for women's come wi...,2.6,37.0,172
3,Women's adidas Sport Inspired Questar Ride Shoes,B44832,"Inspired by modern tech runners, these women's...",4.1,35.0,264
4,Women's adidas Originals Taekwondo Shoes,D98205,This design is inspired by vintage Taekwondo s...,3.5,72.0,288
5,Women's adidas Sport Inspired Duramo Lite 2.0 ...,B75586,Refine your interval training in these women's...,1.0,45.0,221


In [19]:
np.arange(0, info_reviews["description_chars"].max() + 100, 100)

array([  0, 100, 200, 300, 400, 500, 600, 700])

In [20]:
bins_description = [0, 100, 200, 300, 400, 500, 600, 700]
labels_description = ["100", "200", "300", "400", "500", "600", "700"]

info_reviews["description_length"] = pd.cut(info_reviews["description_chars"], labels = labels_description, bins = bins_description)
info_reviews.head()

Unnamed: 0,product_name,product_id,description,rating,reviews,description_chars,description_length
1,Women's adidas Originals Sleek Shoes,G27341,"A modern take on adidas sport heritage, tailor...",3.3,24.0,175,200
2,Women's adidas Swim Puka Slippers,CM0081,These adidas Puka slippers for women's come wi...,2.6,37.0,172,200
3,Women's adidas Sport Inspired Questar Ride Shoes,B44832,"Inspired by modern tech runners, these women's...",4.1,35.0,264,300
4,Women's adidas Originals Taekwondo Shoes,D98205,This design is inspired by vintage Taekwondo s...,3.5,72.0,288,300
5,Women's adidas Sport Inspired Duramo Lite 2.0 ...,B75586,Refine your interval training in these women's...,1.0,45.0,221,300


In [21]:
description_lengths = info_reviews.groupby("description_length").agg({"rating" : "mean", "reviews" : "count"})
description_lengths["rating"] = description_lengths["rating"].round(2)
description_lengths = description_lengths.rename(columns = {"rating" : "mean_rating", "reviews" : "num_reviews"})
description_lengths

  description_lengths = info_reviews.groupby("description_length").agg({"rating" : "mean", "reviews" : "count"})


Unnamed: 0_level_0,mean_rating,num_reviews
description_length,Unnamed: 1_level_1,Unnamed: 2_level_1
100,2.26,7
200,3.19,526
300,3.28,1785
400,3.29,651
500,3.35,118
600,3.12,15
700,3.65,15
