To practice performing statistical inference in Python, we begin by importing a 2015 dataset that contains information on rollercoasters in international amusement parks. Take a glimpse at the data using the *head( )* , *tail( )*, or *describe( )* functions to familiarize yourself with the variables.

In [16]:
# Add libraries
import pandas as pd

# Import data 
Coasters = pd.read_csv('Coasters_2015.csv')

# Glimpse data
Coasters.describe()

Unnamed: 0,Speed,Height,Drop,Length,Duration,Inversions
count,241.0,241.0,118.0,241.0,157.0,241.0
mean,55.350622,125.096598,148.613475,2570.072365,123.019108,0.556017
std,18.664831,70.264668,69.136487,1614.980056,45.087447,0.497886
min,4.5,8.0,47.0,197.5,28.0,0.0
25%,45.0,75.0,95.0,1213.9,90.0,0.0
50%,55.0,115.0,141.0,2423.0,120.0,1.0
75%,65.3,155.0,193.525,3582.7,150.0,1.0
max,149.1,420.0,400.0,8133.1,240.0,1.0


## One-sample t-test

We will begin by performing a one-sample t-test. Recall that the purpose of this parametric test is to determine if there is a significant difference between a sample mean and the hypothesized population mean.

Let's say we were interested in learning the true average speed of roller coasters in this dataset. Let's begin by finding the mean!

In [3]:
# Add Libraries
import numpy as np
import pandas as pd
from scipy.stats import ttest_1samp

# Find mean
mean_value = np.mean(Coasters['Speed'])

print("Mean:", mean_value)

Mean: 55.350622406639026


Roller coaster expert, Biddy Martin, previously stated that roller coasters actually travel at an average speed of 52 mph. Let's put her word to the test and perform a one-sample t-test to determine if this is likely to be true based on our sample.

In [17]:
# Perform a one-sample t-test using 'ttest_1samp'

# Assign the results of our test to two variables
t_test_value, p_value = ttest_1samp(Coasters['Speed'], 52)

# View the results
print("P Value:", p_value)
print("t-test Value:", t_test_value)

P Value: 0.005748417140227706
t-test Value: 2.7868265618258343


Great work! We have just performed our first one-sample t-test. Now let's instruct Python to determine for us whether or not the P Value for our t-test is small enough to reject the null hypothesis that the average speed is 52 mph.

In [20]:
# 0.05 or 5% is significance value of alpha.
if p_value < 0.05:
    
    print("Reject the null hypothesis")
    
else:
    print("Fail to reject the null hypothesis")

Reject the null hypothesis


Unfortunately, it appears that Biddy may have been wrong, as we have to reject the null hypothesis that the average speed of a roller coaster is 52 mph. What a shocker! Biddy usually gets this stuff right. 

In [19]:
# Note: We do not need to assign the output of the function 'ttest_1samp' to two values.
# Python automatically outputs our t-test value and p-value.
# However, it is good practice to assign these values so we can reuse them later.

ttest_1samp(Coasters['Speed'], 52)

Ttest_1sampResult(statistic=2.7868265618258343, pvalue=0.005748417140227706)

## Two-sample t-test

Recall that a two-sample t-test is used to determine if there is a significant difference between the sample 