# Challenge

On Shopify, we have exactly 100 sneaker shops, and each of these shops sells only one model of shoe. We want to do some analysis of the average order value (AOV). When we look at orders data over a 30 day window, we naively calculate an AOV of $3145.13. Given that we know these shops are selling sneakers, a relatively affordable item, something seems wrong with our analysis. 

1. Think about what could be going wrong with our calculation. Think about a better way to evaluate this data. 
2. What metric would you report for this dataset?
3. What is its value?

In [1]:
# Import libraries
import pandas as pd
import hvplot
from pathlib import Path

In [2]:
# Read in csv with 'created_at' column as index
file_path = Path('Resources/Shopify-Data-Science-Challenge.csv')
df = pd.read_csv(file_path, index_col='created_at', infer_datetime_format=True, parse_dates=True)
df.head()

Unnamed: 0_level_0,order_id,shop_id,user_id,order_amount,total_items,payment_method
created_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-03-13 12:36:56,1,53,746,224,2,cash
2017-03-03 17:38:52,2,92,925,90,1,cash
2017-03-14 04:23:56,3,44,861,144,1,cash
2017-03-26 12:43:37,4,18,935,156,1,credit_card
2017-03-01 04:35:11,5,18,883,156,1,credit_card


In [3]:
# Reset index and drop times from datetime 'created_at' column
df.reset_index(inplace=True)
df['created_at'] = df['created_at'].dt.date

In [7]:
# Set 'created_at' column back to index
df.set_index('created_at', inplace=True)

In [8]:
# Check DataFrame
df.head()

Unnamed: 0_level_0,order_id,shop_id,user_id,order_amount,total_items,payment_method
created_at,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-03-13,1,53,746,224,2,cash
2017-03-03,2,92,925,90,1,cash
2017-03-14,3,44,861,144,1,cash
2017-03-26,4,18,935,156,1,credit_card
2017-03-01,5,18,883,156,1,credit_card


In [9]:
# Check nulls
df.isnull().sum()

order_id          0
shop_id           0
user_id           0
order_amount      0
total_items       0
payment_method    0
dtype: int64

In [10]:
# Check Average Order Value (AOV)
df['order_amount'].mean()

3145.128