# DSE Course 1, Session 6: Modeling Product Margin

**Instructor**: Wesley Beckner

**Contact**: wesleybeckner@gmail.com

<br>

---

<br>

In this session we will look at how EDA and statistical analysis can allow us to ask "what if" questions around a manufacturing product portfolio

EDA objectives:

* product elimination impact on annual margin
* evaluating statistical significance of product margin

<br>

---


Load libraries which will be needed in this Notebook



In [None]:
# Pandas library for the pandas dataframes
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import plotly.express as px
import random
import scipy.stats

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/wesleybeckner/'\
                 'ds_for_engineers/main/data/truffle_margin/margin_data.csv')
df['Width'] = df['Width'].apply(str)
df['Height'] = df['Height'].apply(str)

In [None]:
descriptors = df.columns[:-3]

## 6.1 Evaluate statistical significance of product margin

### Mood's Median

Mood’s median test is a nonparametric test to compare the medians of two independent samples. It is also used to estimate whether the median of any two independent samples are equal. Therefore, Mood’s median non parametric hypothesis test is an alternative to the one-way ANOVA. This test works when dependent variable is continuous or discrete-count, and the independent variables are discrete with two or more attributes.

Mood’s median test is a primitive two sample version of sign test. This test can be applied for more than two samples, but it is not as powerful as Kruskal-Wallis Test.

While Mood’s median test is more useful for smaller sample sizes, when the data contains few outliers, because this test is only focuses on median value instead of ranks.

Usually the researchers prefers Wilcoxon Rank Sum test or Mann-Whitney U test as they provides more robust results when compared to Mood’s Median Test.

The Mood’s median test compares whether k independent samples have either drawn from the same population or from populations with equal medians.

In [None]:
delimiters = df.columns[:-3]
moodsdf = pd.DataFrame()
pop = list(df['EBITDA'])
# pop = np.random.choice(pop, size=int(1e5))
for delimiter in delimiters:
    grouped = df.groupby(delimiter)['EBITDA']
    group_with_values = grouped.apply(list)

    # bootstrap population of values based on groups
#     pop = np.random.choice((np.concatenate(group_with_values)), 
#                            size=int(1e4))
    
    for index, group in enumerate(group_with_values):
        stat, p, m, table = scipy.stats.median_test(group, pop)
        median = np.median(group)
        mean = np.mean(group)
        size = len(group)
        moodsdf = pd.concat([moodsdf, 
                                 pd.DataFrame([delimiter, 
                                               group_with_values.index[index],
                                               stat, p, m, mean, median, size, 
                                               table]).T])
moodsdf.columns = ['descriptor', 'group', 'pearsons_chi_square', 'p_value', 
                   'grand_median', 'group_mean', 'group_median', 'size', 
                   'table']


In [None]:
moodsdf = moodsdf.loc[moodsdf['p_value'] < 1e-3]
moodsdf = moodsdf.sort_values('group_median').reset_index(drop=True)

In [None]:
moodsdf

Unnamed: 0,descriptor,group,pearsons_chi_square,p_value,grand_median,group_mean,group_median,size,table
0,Secondary Flavor,Cucumber,12.5898,0.000387861,22.05,-18454.5,-7756.69,18,"[[1, 1261], [17, 1245]]"
1,Primary Flavor,Orange,12.5898,0.000387861,22.05,-18454.5,-7756.69,18,"[[1, 1261], [17, 1245]]"
2,Truffle Type,Jelly Filled,12.5898,0.000387861,22.05,-18454.5,-7756.69,18,"[[1, 1261], [17, 1245]]"
3,Primary Flavor,Creme de Menthe,17.535,2.82072e-05,18.49,-9320.5,-5945.59,23,"[[1, 1263], [22, 1243]]"
4,Secondary Flavor,Papaya,90.8672,1.53648e-21,-27.62,-3790.46,-1683.78,115,"[[7, 1303], [108, 1203]]"
5,Primary Flavor,Orange Pineapple\tP,90.8672,1.53648e-21,-27.62,-3790.46,-1683.78,115,"[[7, 1303], [108, 1203]]"
6,Secondary Flavor,Peppermint,103.973,2.05086e-24,-46.96,-4890.25,-1580.45,157,"[[16, 1315], [141, 1191]]"
7,Primary Flavor,Cream Soda,103.973,2.05086e-24,-46.96,-4890.25,-1580.45,157,"[[16, 1315], [141, 1191]]"
8,Secondary Flavor,Wild Cherry Cream,17.1878,3.38605e-05,8.0,-5155.42,-1434.09,69,"[[17, 1270], [52, 1236]]"
9,Primary Flavor,Lemon Bar,17.1878,3.38605e-05,8.0,-5155.42,-1434.09,69,"[[17, 1270], [52, 1236]]"


## 6.2 product elimination impact on annual margin