1.  Let's suppose that after n=100 flips, we get h=61 heads. We choose a significance level of 0.05: is the coin fair or not? Our null hypothesis is: the coin is fair (q=1/2). We set these variables:

In [2]:
import numpy as np
import scipy.stats as st
import scipy.special as sp
n = 100  # number of coin flips
h = 61  # number of heads
q = 0.5  # null-hypothesis of fair coin

2. Let's compute the z-score, which is defined by the following formula (xbar is the estimated average of the distribution). We will explain this formula in the next section, How it works...

In [5]:
xbar = float(h) / n
z = (xbar - q) * np.sqrt(n / (q * (1 - q)))
# We don't want to display more than 4 decimals.
print(z)

2.1999999999999997


3. Now, from the z-score, we can compute the p-value as follows:

In [4]:
pval = 2 * (1 - st.norm.cdf(z))
print(pval)

0.02780689502699718


4. This p-value is less than 0.05, so we reject the null hypothesis and conclude that the coin is probably not fair.

In [6]:
import scipy.stats as stats
import pandas as pd


In [7]:
penguins = pd.read_csv("penguins_wrangled.csv")
penguins.head()

Unnamed: 0.1,Unnamed: 0,studyName,Sample Number,Species,Region,Island,Stage,Individual ID,Clutch Completion,Date Egg,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex,Delta 15 N (o/oo),Delta 13 C (o/oo),Comments
0,0,PAL0708,1,Adelie,Anvers,Torgersen,"Adult, 1 Egg Stage",N1A1,Yes,11/11/07,39.1,18.7,181.0,3750.0,MALE,,,Not enough blood for isotopes.
1,1,PAL0708,2,Adelie,Anvers,Torgersen,"Adult, 1 Egg Stage",N1A2,Yes,11/11/07,39.5,17.4,186.0,3800.0,FEMALE,8.94956,-24.69454,
2,2,PAL0708,3,Adelie,Anvers,Torgersen,"Adult, 1 Egg Stage",N2A1,Yes,11/16/07,40.3,18.0,195.0,3250.0,FEMALE,8.36821,-25.33302,
3,4,PAL0708,5,Adelie,Anvers,Torgersen,"Adult, 1 Egg Stage",N3A1,Yes,11/16/07,36.7,19.3,193.0,3450.0,FEMALE,8.76651,-25.32426,
4,5,PAL0708,6,Adelie,Anvers,Torgersen,"Adult, 1 Egg Stage",N3A2,Yes,11/16/07,39.3,20.6,190.0,3650.0,MALE,8.66496,-25.29805,


* h0 : µ = 45 -> the mean culmen length is 45 mm
* ha : µ ≠ 45 -> the mean culmen length is not 45 mm

In [18]:
t_test, p_value = stats.ttest_1samp(penguins["Culmen Length (mm)"],45)

In [19]:
print(t_test)

-3.3609291383431597


In [20]:
print(p_value)

0.0008673540360749943


as the pvalue is less than 0.05 we have enough evidence to reject the null hypothesis

In [9]:
print(penguins.mean(numeric_only = True))

Unnamed: 0              172.303303
Sample Number            63.645646
Culmen Length (mm)       43.992793
Culmen Depth (mm)        17.164865
Flipper Length (mm)     200.966967
Body Mass (g)          4207.057057
Delta 15 N (o/oo)         8.739944
Delta 13 C (o/oo)       -25.682842
dtype: float64


In [24]:
std_penguins =penguins["Culmen Length (mm)"].std()

In [33]:
penguins.describe()

Unnamed: 0.1,Unnamed: 0,Sample Number,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Delta 15 N (o/oo),Delta 13 C (o/oo)
count,333.0,333.0,333.0,333.0,333.0,333.0,324.0,325.0
mean,172.303303,63.645646,43.992793,17.164865,200.966967,4207.057057,8.739944,-25.682842
std,97.346548,40.201308,5.468668,1.969235,14.015765,805.215802,0.552073,0.796629
min,0.0,1.0,32.1,13.1,172.0,2700.0,7.6322,-27.01854
25%,89.0,30.0,39.5,15.6,190.0,3550.0,8.30444,-26.32601
50%,172.0,58.0,44.5,17.3,197.0,4050.0,8.658585,-25.83352
75%,256.0,95.0,48.6,18.7,213.0,4775.0,9.181477,-25.04169
max,343.0,152.0,59.6,21.5,231.0,6300.0,10.02544,-23.78767


### Confidence Interval for Means

In [39]:
import numpy as np
ci_one = stats.norm.interval(0.95, loc=44,scale=std_penguins/np.sqrt(333))
print(ci_one)

(43.412635682027954, 44.587364317972046)


In [38]:
# this is the one most ppl used
ci_one = stats.norm.interval(0.95, loc=44,scale=std_penguins)
print(ci_one)

(33.28160700501643, 54.71839299498357)


In [41]:
st.norm.interval(alpha=0.95, loc=44, scale=std_penguins)

(33.28160700501643, 54.71839299498357)

### The t distribution

In [46]:
df = np.array([10, 20, 30, 40])

In [47]:
# 0.975 quantiles for specified dataframe values
stats.t.ppf(0.975, df)

array([2.22813885, 2.08596345, 2.04227246, 2.02107538])

In [49]:
# cumulative probability at t = 2.02107538 when culmenlength = 40
print(stats.t.cdf(2.02107538, 40))

0.974999999443306


### Confidence Interval for proportions

In [None]:
import statsmodels.api as sm

In [None]:
statsprop