1. How do you control for biases?

In [1]:
#1. Randomization: Assign subjects to different groups randomly to balance known and unknown confounding variables.
#2. Blinding: Keep participants, experimenters, and analysts unaware of group assignments to minimize expectation biases.
#3. Matching: Pair participants in experimental and control groups based on specific characteristics like age or gender.
#4. Statistical Control: Use statistical methods such as regression to adjust for the influence of confounding variables.
#5. Replication: Repeat studies to confirm the reliability of results and reduce the impact of random errors.
#6. Pre-registration: Define and publicly register the study design and analysis methods before conducting research to prevent data mining biases.

2. What are confounding variables?

In [2]:
#Confounding variables are factors other than the independent variable that might affect the dependent variable in a study, potentially distorting the true relationship between the variables being examined. They can cause false correlations and incorrect conclusions about causality. Identifying and controlling for confounding variables is crucial for accurate research findings.

3. What is A/B testing?

In [3]:
#A/B testing is a method used to compare two versions of a variable to determine which one performs better. Typically used in marketing, web design, and product development, this method involves randomly assigning a sample of users to either the control group (A) or the experimental group (B) and then comparing their responses to a specific metric or outcome. The goal is to identify whether changes in one version lead to statistically significant improvements.

4. When will you use Welch t-test?

In [4]:
#The Welch t-test is used when comparing the means of two groups that may have unequal variances and possibly unequal sample sizes. It is particularly useful when the assumption of equal variances, required by the standard Student's t-test, does not hold. This makes the Welch t-test a more robust option for testing differences between two independent samples under these conditions.

5. A company claims that the average time its customer service representatives spend on the phone per call is 6 
minutes. You believe that the average time is actually higher. You collect a random sample of 50 calls and find 
that the average time spent on the phone per call in your sample is 6.5 minutes, with a standard deviation of 1.2 
minutes. Test whether there is sufficient evidence to support your claim at a significance level of 0.05.

In [5]:
import numpy as np
from scipy.stats import t

sample_mean = 6.5
population_mean = 6
sample_std = 1.2
sample_size = 50

t_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size))
degrees_of_freedom = sample_size - 1
critical_t_value = t.ppf(1 - 0.05, df=degrees_of_freedom)
p_value = 1 - t.cdf(t_statistic, df=degrees_of_freedom)

print("t-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)
print("p-value:", p_value)

t-statistic: 2.946278254943948
Critical t-value: 1.6765508919142629
p-value: 0.0024555744280253533


In [6]:
# These results indicate that we can reject the null hypothesis.

6. A researcher wants to determine whether there is a difference in the mean scores of two groups of students on a 
math test. Group A consists of 25 students who received traditional teaching methods, while Group B consists of 30 
students who received a new teaching method. The average score for Group A is 75, with a standard deviation of 8, 
and the average score for Group B is 78, with a standard deviation of 7. Test whether there is a significant 
difference in the mean scores of the two groups at a significance level of 0.05.

In [7]:
from scipy.stats import ttest_ind_from_stats

mean_a = 75
std_a = 8
n_a = 25

mean_b = 78
std_b = 7
n_b = 30

t_stat, p_value = ttest_ind_from_stats(mean1=mean_a, std1=std_a, nobs1=n_a,
                                       mean2=mean_b, std2=std_b, nobs2=n_b, 
                                       equal_var=False)  
t_stat, p_value

(-1.4650132801342768, 0.14941450596390296)

In [8]:
#These findings suggest that we fail to reject the null hypothesis. 