# Non-Parametric Tests

# 1. Assumptions in hypothesis testing

<b>1.1 Common assumptions of hypothesis tests</b>

Hypothesis tests make assumptions about the dataset that they are testing, and the conclusions you draw from the test results are only valid if those assumptions hold. While some assumptions differ between types of test, others are common to all hypothesis tests.

Which of the following statements is a common assumption of hypothesis tests?

Possible Answers

- Sample observations are collected deterministically from the population.

- Sample observations are correlated with each other.

- <b><font color ='green'>Sample observations have no direct relationship with each other.</font></b>

- Sample sizes are greater than thirty observations.

All hypothesis tests assume that the data are collected at random from the population, that each row is independent of the others, and that the sample size is "big enough"..

<b>1.2 Testing sample size</b>

In order to conduct a hypothesis test and be sure that the result is fair, a sample must meet three requirements: it is a random sample of the population, the observations are independent, and there are enough observations. Of these, only the last condition is easily testable with code.

The minimum sample size depends on the type of hypothesis tests you want to perform. You'll now test some scenarios on the late_shipments dataset.

Note that the .all() method from pandas can be used to check if all elements are true. For example, given a DataFrame df with numeric entries, you check to see if all its elements are less than 5, using (df < 5).all().

In [34]:
import pandas as pd
late_shipments = pd.read_feather("C:\\Users\\yazan\\Desktop\\Data_Analytics\\9-Introduction to Hypothesis Testing\\Datasets\\late_shipments.feather")

In [35]:
'''Get the count of each value in the freight_cost_group column of late_shipments.
Insert a suitable number to inspect whether the counts are "big enough" for a two sample t-test.'''

# Count the freight_cost_group values
counts = late_shipments['freight_cost_groups'].value_counts()

# Print the result
print(counts)

# Inspect whether the counts are big enough
print((counts >= 30).all())

expensive     531
reasonable    455
Name: freight_cost_groups, dtype: int64
True


In [36]:
'''Get the count of each value in the late column of late_shipments.
Insert a suitable number to inspect whether the counts are "big enough" for a one sample proportion test.'''

# Count the late values
counts = late_shipments['late'].value_counts()

# Print the result
print(counts)

# Inspect whether the counts are big enough
print((counts >= 10).all())

No     939
Yes     61
Name: late, dtype: int64
True


In [37]:
'''Get the count of each value in the freight_cost_group column of late_shipments grouped by vendor_inco_term.
Insert a suitable number to inspect whether the counts are "big enough" for a chi-square independence test.'''
# Count the values of freight_cost_group grouped by vendor_inco_term
counts = late_shipments.groupby('vendor_inco_term')['freight_cost_groups'].value_counts()

# Print the result
print(counts)

# Inspect whether the counts are big enough
print((counts >= 5).all())

vendor_inco_term  freight_cost_groups
CIP               reasonable              34
                  expensive               16
DDP               expensive               55
                  reasonable              45
DDU               reasonable               1
EXW               expensive              423
                  reasonable             302
FCA               reasonable              73
                  expensive               37
Name: freight_cost_groups, dtype: int64
False


In [38]:
'''Get the count of each value in the shipment_mode column of late_shipments.
Insert a suitable number to inspect whether the counts are "big enough" for an ANOVA test.'''

# Count the shipment_mode values
counts = late_shipments['shipment_mode'].value_counts()

# Print the result
print(counts)

# Inspect whether the counts are big enough
print((counts >= 30).all())

Air            906
Ocean           88
Air Charter      6
Name: shipment_mode, dtype: int64
False


While randomness and independence of observations can't easily be tested programmatically, you can test that your sample sizes are big enough to make a hypothesis test appropriate. Based on the last result, we should be a little cautious of the ANOVA test results given the small sample size for Air Charter.

# 2. Non-parametric tests

<b>2.1 Which parametric test?</b>

Which test is a parametric equivalent to the Wilcoxon signed-rank test?

Possible Answers:

- z-test for a difference in proportions

- Chi-square goodness of fit test

- <b><font color = 'green'>Paired t-test</font></b>

- ANOVA


 The Wilcoxon signed-rank test works well when the assumptions of a paired t-test aren't met.

<b>2.2 Wilcoxon signed-rank test</b>

You'll explore the difference between the proportion of county-level votes for the Democratic candidate in 2012 and 2016 to identify if the difference is significant.

In [39]:
import pingouin
sample_dem_data = pd.read_feather("C:\\Users\\yazan\\Desktop\\Data_Analytics\\9-Introduction to Hypothesis Testing\\Datasets\\dem_votes_potus_12_16.feather")

# Conduct a paired t-test on dem_percent_12 and dem_percent_16
paired_test_results = pingouin.ttest(x=sample_dem_data['dem_percent_16'],
y=sample_dem_data['dem_percent_12'],
paired=True,
alternative="two-sided") 


# Print paired t-test results
print(paired_test_results)

                T  dof alternative          p-val           CI95%   cohen-d  \
T-test -30.298384  499   two-sided  3.600634e-115  [-7.27, -6.39]  0.454202   

              BF10  power  
T-test  2.246e+111    1.0  


In [40]:
# Conduct a Wilcoxon test on dem_percent_12 and dem_percent_16
wilcoxon_test_results = pingouin.wilcoxon(x=sample_dem_data['dem_percent_12'],
y=sample_dem_data['dem_percent_16'],
alternative="two-sided")

# Print Wilcoxon test results
print(wilcoxon_test_results)

           W-val alternative         p-val       RBC      CLES
Wilcoxon  2401.0   two-sided  1.780396e-77  0.961661  0.644816


Given the large sample size (500), you obtained similar results here between the parametric t-test and non-parametric Wilcoxon test with a very small p-value.

# 3. Non-parametric ANOVA and unpaired t-tests

<b>3.1 Wilcoxon-Mann-Whitney</b>

Another class of non-parametric hypothesis tests are called rank sum tests. Ranks are the positions of numeric values from smallest to largest. Think of them as positions in running events: whoever has the fastest (smallest) time is rank 1, second fastest is rank 2, and so on.

By calculating on the ranks of data instead of the actual values, you can avoid making assumptions about the distribution of the test statistic. It's more robust in the same way that a median is more robust than a mean.

One common rank-based test is the Wilcoxon-Mann-Whitney test, which is like a non-parametric t-test.

In [41]:
# Select the weight_kilograms and late columns
weight_vs_late = late_shipments[['weight_kilograms', 'late']]

# Convert weight_vs_late into wide format
weight_vs_late_wide = weight_vs_late.pivot(columns='late', values='weight_kilograms')


# Run a two-sided Wilcoxon-Mann-Whitney test on weight_kilograms vs. late
wmw_test = pingouin.mwu(x = weight_vs_late_wide['Yes'],
y=weight_vs_late_wide['No'],
alternative='two-sided')

# Print the test results
print(wmw_test)

       U-val alternative     p-val       RBC      CLES
MWU  38145.0   two-sided  0.000014 -0.331902  0.665951
