<div class="alert alert-block alert-warning">
An independent Chi-Square is used when you want to determine whether two categorical variables influence each other.
<br>
<b>Requires:</b><br>
    - 2 independent <i>categorical</i> variables<br>
</div>

## QUESTION:
Are the `shade` of `lipstick` and the `price` of `lipstick` related? 

> <b>Activity Notes</b>
> 1. Create a contingency table
> 2. Test for the assumption of 5 per cell in the expected contingency table
> 3. Compute an independent Chi-Square

## Import Packages

In [1]:
import pandas as pd
from scipy import stats

## Load Data

In [2]:
lead_lipstick = pd.read_csv('./assets/lead_lipstick.csv')

In [3]:
lead_lipstick.head()

Unnamed: 0,JRC_code,purchCntry,prodCntry,Pb,sdPb,shade,prodType,priceCatgry
0,C135,NL,NL,3.75,0.24,Red,LP,2
1,C18,FI,FI,2.29,0.07,Red,LP,2
2,C20,FI,IT,1.27,0.06,Red,LP,2
3,C164,DE,FR,1.21,0.06,Red,LP,2
4,C71,MT,UK,0.85,0.04,Red,LP,2


## Find Categorical Variable Levels

In [4]:
lead_lipstick.shade.unique()

array(['Red', 'Purple', 'Pink', 'Brown'], dtype=object)

In [8]:
lead_lipstick.priceCatgry.unique()

array([2, 3, 1])

<b>Product Shades (`shade`)</b> has 4 levels<br>
   - 'Red'
   - 'Purple'
   - 'Pink'
   - 'Brown'

<b>Price category (`priceCatgry`)</b>  has 3 levels:
   - <b>1</b> : < 5 euros
   - <b>2</b> : 5 - 15 euros
   - <b>3</b> : > 15 euros

<div class="alert alert-block alert-danger">
<b>Phrasing?<br></b> The activity instructions specified <b>lipstick</b>.<br>However, the term <i>lipstick</i> is seems to be used here as an inclusive term to include both <i>lipstick</i> and <i>lipgloss</i>.
</div>

## Test Assumptions
> 1. Create a contingency table
> 2. Test for the assumption of 5 per cell in the expected contingency table
> 3. Compute an independent Chi-Square

### Create a Contingency Table (aka `crosstab`)

Create a contingency table, sometimes called a crosstab, which shows how each level of each variable crosses with the other variable levels.<br>Use `pandas` function `crosstab()`:

In [9]:
lipstick_crosstab = pd.crosstab(lead_lipstick['shade'], lead_lipstick['priceCatgry'])

In [10]:
lipstick_crosstab

priceCatgry,1,2,3
shade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Brown,20,30,10
Pink,20,49,12
Purple,8,23,6
Red,5,33,7


The three price categories are on the top, and the two different product types are along the side.<br>What is shown in the cells are how many products fit in both categories.

### Running the Independent Chi-Square

Once you have the contingency table, then you can run the function `stats.chi2_contingency` on the contingency table you have created:

In [11]:
stats.chi2_contingency(lipstick_crosstab)

(7.860569553614045,
 0.2484973879479863,
 6,
 array([[14.26008969, 36.32286996,  9.41704036],
        [19.25112108, 49.03587444, 12.71300448],
        [ 8.79372197, 22.39910314,  5.80717489],
        [10.69506726, 27.24215247,  7.06278027]]))

- <b>First value</b> is the Chi-Square statistic.
- <b>Second value</b> is the `p` value is the one associated with the Chi-Square statistic.
- `p` value is *not* significant at <b>`p > .05.`<b>

<mark>There is <i>not</i> a significant relationship between product shade and product price.</mark>

### Test the Assumption of 5 Cases per Expected Cell

- <b>`array`</b> is the expected count contingency table if there was <i>no relationship</i> between the two variables.

#### <mark>Assumption <i>has been met</i> with expected counts per cell > 5.</mark>