<img src="rupixen-Q59HmzK38eQ-unsplash.jpg" alt="Someone is trying to purchase a produce online" width="500"/>

Online shopping decisions rely on how consumers engage with online store content. You work for a new startup company that has just launched a new online shopping website. The marketing team asks you, a new data scientist, to review a dataset of online shoppers' purchasing intentions gathered over the last year. Specifically, the team wants you to generate some insights into customer browsing behaviors in November and December, the busiest months for shoppers. You have decided to identify two groups of customers: those with a low purchase rate and returning customers. After identifying these groups, you want to determine the probability that any of these customers will make a purchase in a new marketing campaign to help gauge potential success for next year's sales.

### Data description:

You are given an `online_shopping_session_data.csv` that contains several columns about each shopping session. Each shopping session corresponded to a single user. 

|Column|Description|
|--------|-----------|
|`SessionID`|unique session ID|
|`Administrative`|number of pages visited related to the customer account|
|`Administrative_Duration`|total amount of time spent (in seconds) on administrative pages|
|`Informational`|number of pages visited related to the website and the company|
|`Informational_Duration`|total amount of time spent (in seconds) on informational pages|
|`ProductRelated`|number of pages visited related to available products|
|`ProductRelated_Duration`|total amount of time spent (in seconds) on product-related pages|
|`BounceRates`|average bounce rate of pages visited by the customer|
|`ExitRates`|average exit rate of pages visited by the customer|
|`PageValues`|average page value of pages visited by the customer|
|`SpecialDay`|closeness of the site visiting time to a specific special day|
|`Weekend`|indicator whether the session is on a weekend|
|`Month`|month of the session date|
|`CustomerType`|customer type|
|`Purchase`|class label whether the customer make a purchase|

## Questions
1. What are the purchase rates for online shopping sessions by customer type for **November** and **December**?
2. What is the strongest correlation in total time spent among page types by returning customers in November and December?
3. A new campaign for the returning customers will boost the purchase rate by 15%. What is the likelihood of achieving at least 100 sales out of 500 online shopping sessions for the returning customers?

In [13]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Load and view your data
shopping_data = pd.read_csv("online_shopping_session_data.csv")
shopping_data.sample(10)

Unnamed: 0,SessionID,Administrative,Administrative_Duration,Informational,Informational_Duration,ProductRelated,ProductRelated_Duration,BounceRates,ExitRates,PageValues,SpecialDay,Weekend,Month,CustomerType,Purchase
9578,9579,0,0.0,0,0.0,17,664.5,0.0,0.011765,0.0,0.0,False,Nov,Returning_Customer,0.0
7861,7862,1,114.8,0,0.0,10,951.433333,0.0,0.018182,0.0,0.0,False,Sep,Returning_Customer,0.0
2273,2274,0,0.0,6,451.5,28,674.434524,0.005882,0.017647,0.0,0.0,True,May,Returning_Customer,0.0
10509,10510,4,144.5,2,52.5,39,1299.5,0.004878,0.021138,0.0,0.0,False,Nov,Returning_Customer,0.0
4410,4411,0,0.0,0,0.0,62,1471.858333,0.060656,0.080437,0.0,1.0,True,May,Returning_Customer,0.0
2461,2462,1,5.0,0,0.0,92,1688.058761,0.004348,0.016494,0.0,0.0,True,May,Returning_Customer,0.0
6798,6799,3,213.0,0,0.0,2,338.0,0.0,0.0,112.414127,0.0,False,Oct,Returning_Customer,1.0
7319,7320,4,464.5,0,0.0,9,635.72381,0.0,0.002778,0.0,0.0,False,Jul,New_Customer,0.0
2924,2925,0,0.0,0,0.0,24,325.833333,0.0,0.020833,0.0,0.0,True,May,Returning_Customer,0.0
4852,4853,0,0.0,0,0.0,8,211.0,0.0,0.028571,0.0,0.0,False,May,New_Customer,0.0


In [52]:
shopping_data_nov_dec = shopping_data[(shopping_data["Month"] == "Nov") | (shopping_data["Month"] == "Dec")]

# The above code is the same as the below code
# shopping_data[shopping_data["Month"].isin(["Nov", "Dec"])]

shopping_data_nov_dec

Unnamed: 0,SessionID,Administrative,Administrative_Duration,Informational,Informational_Duration,ProductRelated,ProductRelated_Duration,BounceRates,ExitRates,PageValues,SpecialDay,Weekend,Month,CustomerType,Purchase
5463,5464,1,39.200000,2,120.800,7,80.500000,0.000000,0.010000,0.000000,0.0,True,Nov,New_Customer,0.0
5464,5465,3,89.600000,0,0.000,57,1721.906667,0.000000,0.005932,204.007949,0.0,True,Nov,Returning_Customer,1.0
5467,5468,4,204.200000,0,0.000,31,652.376667,0.012121,0.016162,0.000000,0.0,False,Nov,Returning_Customer,0.0
5479,5480,0,0.000000,0,0.000,13,710.066667,0.000000,0.007692,72.522838,0.0,False,Nov,Returning_Customer,1.0
5494,5495,0,0.000000,0,0.000,24,968.692424,0.000000,0.000000,106.252517,0.0,False,Nov,Returning_Customer,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12049,12050,6,141.916667,2,1060.750,136,8777.613879,0.007160,0.016361,21.382576,0.0,False,Dec,Returning_Customer,1.0
12050,12051,6,183.250000,2,80.500,60,1883.444231,0.000000,0.020858,0.000000,0.0,True,Nov,Returning_Customer,1.0
12051,12052,10,297.833333,12,290.225,33,1467.654221,0.008000,0.023038,13.203310,0.0,True,Dec,Returning_Customer,0.0
12052,12053,0,0.000000,2,407.500,14,1043.150000,0.050000,0.081250,0.000000,0.0,True,Nov,Returning_Customer,0.0


In [60]:
print(shopping_data_nov_dec[shopping_data_nov_dec["CustomerType"] == "New_Customer"].shape[0])
print(shopping_data_nov_dec[shopping_data_nov_dec["CustomerType"] == "Returning_Customer"].shape[0])

728
3722


In [56]:
customers = shopping_data_nov_dec.groupby("CustomerType")["Purchase"].sum().reset_index()
customers["Purchase"] = customers["Purchase"]

''''
The above code is the same as the below code

customers = shopping_data_nov_dec.groupby(["CustomerType", "Purchase"]).value_counts().reset_index()
customers[customers["CustomerType"] == "New_Customer"].agg({"Purchase" : "sum"})
''''

0    199.0
1    728.0
Name: Purchase, dtype: float64