## IS6 in Python: Comparing Groups (Chapter 17)

### Introduction and background

This document is intended to assist students in undertaking examples shown in the Sixth Edition of Intro Stats (2022) by De Veaux, Velleman, and Bock. This pdf file as well as the associated ipynb reproducible analysis source file used to create it can be found at (INSERT WEBSITE LINK HERE).

#### Chapter 17: Comparing Groups

In [21]:
# Read in libraries
import pandas as pd
import numpy as np
from scipy.stats import norm

#### Section 17.1: A Confidence Interval for the Difference Between Two Proportions

In [22]:
#Create dataframe for seatbelts
seatbelts = np.array(["F", True] * 2777 + ["F", False] * (4208 - 2777) + ["M", True] * 1363 + ["M", False] * (2763 - 1363)).reshape(-1,2)
seatbelts = pd.DataFrame(seatbelts, columns = ["passenger", "belted"])
seatbelts.head()

Unnamed: 0,passenger,belted
0,F,True
1,F,True
2,F,True
3,F,True
4,F,True


Question: 
- What is the diffmean() equivalent in Python?
- What is the resample() equivalent in Python? Note, the dataframe.resample() can only work with time series

#### Example 17.1:  Finding the Standard Error of a Difference in Proportions

In [23]:
#Create dataframe for online profiles
online = np.array(["M", True] * 141 + ["M", False] * 107 + ["F", True] * 179 + ["F", False] * 77).reshape(-1,2)
online = pd.DataFrame(online, columns = ["gender", "profile"])
online.head()

Unnamed: 0,gender,profile
0,M,True
1,M,True
2,M,True
3,M,True
4,M,True


In [24]:
print(online.groupby("gender").count())

        profile
gender         
F           256
M           248


In [25]:
male = online[online["gender"] == "M"]
num_male = male["gender"].count()
print(f"number of males: {num_male}")

number of males: 248


In [26]:
num_yes_m = male[male["profile"] == "True"]
num_yes_m = num_yes_m["profile"].count()
prop_yes_m = num_yes_m / 248
print(f"proportion of yes male profile: {prop_yes_m}")

proportion of yes male profile: 0.5685483870967742


In [27]:
se_male = ((prop_yes_m * (1 - prop_yes_m)) / num_male) ** 0.5
print(f"standard error of male: {se_male}")

standard error of male: 0.03145023710270326


In [28]:
female = online[online["gender"] == "F"]
num_female = female["gender"].count()
print(f"number of females: {num_female}")

number of females: 256


In [29]:
num_yes_f = female[female["profile"] == "True"]
num_yes_f = num_yes_f["profile"].count()
prop_yes_f = num_yes_f / 256
print(f"proportion of yes female profile: {prop_yes_f}")

proportion of yes female profile: 0.69921875


In [30]:
se_female = ((prop_yes_f * (1 - prop_yes_f)) / num_female) ** 0.5
print(f"standard error of female: {se_female}")

standard error of female: 0.028662358921400885


In [31]:
sep = (se_male ** 2 + se_female ** 2) ** 0.5
print(f"overall SE: {sep}")

overall SE: 0.04255171245385386


#### Example 17.2: Finding a Two-Proportion z-Interval 

In [33]:
zstats = norm.ppf([.025, .975])
print((prop_yes_f - prop_yes_m) + zstats * sep)

[0.04727054 0.21407019]


Question: What is the prop.test() equivalent in Python?

#### Section 17.2: Assumptions and Conditions for Comparing Proportions
#### Section 17.3: The Two-Sample z-Test: Testing for the Difference Between Proportions
#### Step-By-Step Example: A Two-Proportion z-Test