In [1]:
#importing the necessary libraries
import pandas as pd 
import numpy as np
from scipy.stats import chi2

## **Hypotheses**

### **Null Hypothesis (H_0)**

- There is no significant association between the type of device purchased (Smart Thermostats vs Smart Lights) and the customer satisfaction level.

### **Alternative Hypothesis (H_1)**
- There is a significant association between the type of device purchased and the customer satisfaction level

In [4]:
#Creating a table for calculations
data = {'very satisfied':[50,70],
        'satisfied':[80,100],
        'neutral':[60,90],
        'unsatified':[30,50],
        'very unsatisfied':[20,50]}
print(data)

{'very satisfied': [50, 70], 'satisfied': [80, 100], 'neutral': [60, 90], 'unsatified': [30, 50], 'very unsatisfied': [20, 50]}


In [7]:
#Creating a index for the table of customer review
reviews_df =pd.DataFrame(data,index=['smart thermostat','smart light'])
print("customer reviews:")
print(reviews_df)

customer reviews:
                  very satisfied  satisfied  neutral  unsatified  \
smart thermostat              50         80       60          30   
smart light                   70        100       90          50   

                  very unsatisfied  
smart thermostat                20  
smart light                     50  


### **Calculating the Chi-Square statistic with the formula:**

[ \chi^2 = \sum \frac{(O - E)^2}{E} ]

Where:

(O) = Observed frequency
(E) = Expected frequency
The expected frequency for each cell can be calculated using:

[ E = \frac{(row\ total) \times (column\ total)}{grand\ total} ]

In [9]:
#Calculating expected frequencies
total = reviews_df.sum().sum()
row_totals=reviews_df.sum(axis=1)
column_totals=reviews_df.sum(axis=0)

expected_freq = np.outer(row_totals,column_totals) / total 
print ('\nExpected frequencies:')
print(expected_freq)


Expected frequencies:
[[ 48.  72.  60.  32.  28.]
 [ 72. 108.  90.  48.  42.]]


In [15]:
#Calculating chi-square stats
observed =reviews_df.values
chi_square_stats = ((observed - expected_freq)**2/ expected_freq).sum()
print('\nChi square stats:' ,chi_square_stats)


Chi square stats: 5.638227513227513


### **Determining the critical value:**
The critical value can be found using the Chi-Square distribution table. The degrees of freedom (df) can be calculated as:

[ df = (number\ of\ rows - 1) \times (number\ of\ columns - 1) ]

In [18]:
#Determining the critical value
alpha = 0.05
df_chi_square = (len(row_totals)-1)*(len(column_totals)-1)
critical_value = chi2.ppf(1-alpha,df_chi_square)
print('critical value at alpha = 0.05:',critical_value)

critical value at alpha = 0.05: 9.487729036781154


### **Making a deicision**
By comparing the chi square statistic with the critical value we decide whether to reject the null hypotheses or not

In [21]:
#Making a decsion
if chi_square_stats>critical_value:
    print("rejecting the null hypotheses:there's significant association") #Between the device type and customer satisfactions
else:
    print("failed to reject the null hypotheses:there's no significant association")

failed to reject the null hypotheses:there's no significant association


## **Conclusion**
since the chi square statistic is smaller than critical value we can decide that there's no significant association between device type purchased by the customer and customer satisfaction