# One Sample Z Test

Performed when the population means and standard deviation are known.

## Example-1

- Suppose that a beach is safe to swim if the mean level of lead in the water is 10.0 (μ0) parts/million.  
- We assume Xi ~ N(μ, σ = 1.5)
- Water safety is going to be determined by taking 40 water samples and using the test statistic.
- Sample mean = 10.5
- α = 0.05

In [1]:
import scipy.stats as stats
from math import sqrt

In [2]:
x_bar = 10.5 # sample mean
n = 40 # number of samples
sigma = 1.5 # sd of population
mu = 10 # Population mean

Calculate the test statistic

In [3]:
z = ((x_bar-mu)/(sigma/sqrt(n)))
z

2.1081851067789197

Calculate the p-value

In [None]:
p_value =
p_value

0.017864420562816563

In [None]:
alpha = 0.05

if p_value<alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

At 0.05 level of significance, we can reject the null hypothesis in favor of alternative hypothesis.


# One Sample t Test

## Example-1

- Bon Air ELEM has 1000 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students.
- Among the sampled students, the average IQ is 108 with a standard deviation of 10.
- Based on these results, should the principal accept or reject her original hypothesis? α = 0.01

In [None]:
x_bar = 108 # sample mean
n = 20 # number of students
s = 10 # sd of sample
mu = 110 # Population mean
alpha = 0.01

Calculate the test statistic

In [None]:
t =
t

-0.8944271909999159

In [None]:
p_value =
p_value

0.1911420676837155

In [None]:
if p_value<alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

At 0.01 level of significance, we fail to reject the null hypothesis.


# Example-2

### scipy.stats

In [None]:
pip install statsmodels

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import statsmodels.api as sm

In [None]:
df = sm.datasets.get_rdataset(dataname = "Pima.tr", package = "MASS")
df.keys()

dict_keys(['data', '__doc__', 'package', 'title', 'from_cache'])

In [None]:
print(df.__doc__)

.. container::

   Pima.tr R Documentation

   .. rubric:: Diabetes in Pima Indian Women
      :name: Pima.tr

   .. rubric:: Description
      :name: description

   A population of women who were at least 21 years old, of Pima Indian
   heritage and living near Phoenix, Arizona, was tested for diabetes
   according to World Health Organization criteria. The data were
   collected by the US National Institute of Diabetes and Digestive and
   Kidney Diseases. We used the 532 complete records after dropping the
   (mainly missing) data on serum insulin.

   .. rubric:: Usage
      :name: usage

   ::

      Pima.tr
      Pima.tr2
      Pima.te

   .. rubric:: Format
      :name: format

   These data frames contains the following columns:

   ``npreg``
      number of pregnancies.

   ``glu``
      plasma glucose concentration in an oral glucose tolerance test.

   ``bp``
      diastolic blood pressure (mm Hg).

   ``skin``
      triceps skin fold thickness (mm).

   ``bmi``
      body 

In [None]:
df.data

Unnamed: 0,npreg,glu,bp,skin,bmi,ped,age,type
0,5,86,68,28,30.2,0.364,24,No
1,7,195,70,33,25.1,0.163,55,Yes
2,5,77,82,41,35.8,0.156,35,No
3,0,165,76,43,47.9,0.259,26,No
4,0,107,60,25,26.4,0.133,23,No
...,...,...,...,...,...,...,...,...
195,2,141,58,34,25.4,0.699,24,No
196,7,129,68,49,38.5,0.439,43,Yes
197,0,106,70,37,39.4,0.605,22,No
198,1,118,58,36,33.3,0.261,23,No


In [None]:
df = df.data

In [None]:
df.head()

Unnamed: 0,npreg,glu,bp,skin,bmi,ped,age,type
0,5,86,68,28,30.2,0.364,24,No
1,7,195,70,33,25.1,0.163,55,Yes
2,5,77,82,41,35.8,0.156,35,No
3,0,165,76,43,47.9,0.259,26,No
4,0,107,60,25,26.4,0.133,23,No


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   npreg   200 non-null    int64  
 1   glu     200 non-null    int64  
 2   bp      200 non-null    int64  
 3   skin    200 non-null    int64  
 4   bmi     200 non-null    float64
 5   ped     200 non-null    float64
 6   age     200 non-null    int64  
 7   type    200 non-null    object 
dtypes: float64(2), int64(5), object(1)
memory usage: 12.6+ KB


In [None]:
df.describe()

Unnamed: 0,npreg,glu,bp,skin,bmi,ped,age
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,3.57,123.97,71.26,29.215,32.31,0.460765,32.11
std,3.366268,31.667225,11.479604,11.724594,6.130212,0.307225,10.975436
min,0.0,56.0,38.0,7.0,18.2,0.085,21.0
25%,1.0,100.0,64.0,20.75,27.575,0.2535,23.0
50%,2.0,120.5,70.0,29.0,32.8,0.3725,28.0
75%,6.0,144.0,78.0,36.0,36.5,0.616,39.25
max,14.0,199.0,110.0,99.0,47.9,2.288,63.0


In [None]:
# suppose we hypothesize that the population mean of bmi among Pima Indian women is above 30.
# Because we found sample mean as x_bar = 32.31

In [None]:
# bmi mean:
# Ho:
# Ha:

In [None]:
df.bmi.mean()

32.31

In [None]:
# sample size = 200
# sample std = 6.13
# sample mean = 32.31

In [None]:
onesample = stats.ttest_1samp (.......)
onesample

TtestResult(statistic=5.329070841262502, pvalue=2.6614410307455736e-07, df=199)

In [None]:
#help(stats.ttest_1samp)

In [None]:
onesample.pvalue/2        #because it is a two sided test we should divide the p_value by 2 at the and if we are seeking one side test.

1.3307205153727868e-07

In [None]:
alpha = 0.05
if onesample.pvalue/2<alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

At 0.05 level of significance, we can reject the null hypothesis in favor of alternative hypothesis.


In [None]:
# rather than dividing pvalue with 2 we can use alternative = "greater"
onesample = stats.ttest_1samp (..................)
onesample

TtestResult(statistic=5.329070841262502, pvalue=1.3307205153727868e-07, df=199)

## Classical method for Example 2

In [None]:
sem = ........    # sigma_x_bar = sem
sem

0.43347143785627434

In [None]:
t = ..........
t

5.329070841262502

In [None]:
p_value = ...................
p_value

1.3307205160018043e-07

# Example-3

In [None]:
import seaborn as sns
sns.get_dataset_names()

['anagrams',
 'anscombe',
 'attention',
 'brain_networks',
 'car_crashes',
 'diamonds',
 'dots',
 'dowjones',
 'exercise',
 'flights',
 'fmri',
 'geyser',
 'glue',
 'healthexp',
 'iris',
 'mpg',
 'penguins',
 'planets',
 'seaice',
 'taxis',
 'tips',
 'titanic']

In [None]:
df = sns.load_dataset("penguins")
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


In [None]:
df = df.dropna()
df.shape

(333, 7)

In [None]:
df[df.sex=="Male"].describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,168.0,168.0,168.0,168.0
mean,45.854762,17.891071,204.505952,4545.684524
std,5.366896,1.863351,14.547876,787.628884
min,34.6,14.1,178.0,3250.0
25%,40.975,16.075,193.0,3900.0
50%,46.8,18.45,200.5,4300.0
75%,50.325,19.25,219.0,5312.5
max,59.6,21.5,231.0,6300.0


In [None]:
df[df.sex=="Male"].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 168 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            168 non-null    object 
 1   island             168 non-null    object 
 2   bill_length_mm     168 non-null    float64
 3   bill_depth_mm      168 non-null    float64
 4   flipper_length_mm  168 non-null    float64
 5   body_mass_g        168 non-null    float64
 6   sex                168 non-null    object 
dtypes: float64(4), object(3)
memory usage: 10.5+ KB


In [None]:
# avg body mass for male penguins:
# Ho: mu = 4500
# Ha: mu > 4500

In [None]:
onesample =
onesample

TtestResult(statistic=0.7517996322753101, pvalue=0.22661489865701812, df=167)

In [None]:
# df: The number of degrees of freedom used in calculation of the

In [None]:
#help(stats.ttest_1samp)

In [None]:
alpha = 0.05
if onesample.pvalue<alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

At 0.05 level of significance, we fail to reject the null hypothesis.


# Example-4

In [None]:
df = sns.load_dataset("mpg")
df.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino


In [None]:
df = df.dropna()
df.shape

(392, 9)

In [None]:
df[df.origin=="usa"].describe()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year
count,245.0,245.0,245.0,245.0,245.0,245.0,245.0
mean,20.033469,6.277551,247.512245,119.04898,3372.489796,14.990204,75.591837
std,6.440384,1.655996,98.376347,39.89779,795.34669,2.73602,3.660368
min,9.0,4.0,85.0,52.0,1800.0,8.0,70.0
25%,15.0,4.0,151.0,88.0,2720.0,13.0,73.0
50%,18.5,6.0,250.0,105.0,3381.0,15.0,76.0
75%,24.0,8.0,318.0,150.0,4055.0,16.7,78.0
max,39.0,8.0,455.0,230.0,5140.0,22.2,82.0


In [None]:
# avg mpg for usa cars:
# Ho: mu = 20
# Ha: mu > 20

In [None]:
onesample =
onesample

TtestResult(statistic=0.08134278565721932, pvalue=0.46761801558957883, df=244)

In [None]:
alpha = 0.05
if onesample.pvalue<alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

At 0.05 level of significance, we fail to reject the null hypothesis.
