### Q1: What is the difference between a t-test and a z-test? Provide an example scenario where you would use each type of test.

Ans - A z-test is used to test a Null Hypothesis if the population variance is known, or if the sample size is larger than 30, for an unknown population variance. A t-test is used when the sample size is less than 30 and the population variance is unknown.

* Example scenario for using a t-test: Suppose you want to compare the mean scores of two groups of students who have been taught using different teaching methods. You collect a small sample from each group (e.g., n1 = 20, n2 = 25) and measure their scores. Since the population standard deviations are unknown, you would use a t-test to assess whether there is a significant difference in the mean scores between the two groups.

* On the other hand, a z-test is used when the population standard deviation is known or when the sample size is large (typically greater than or equal to 30). It is based on the standard normal distribution (z-distribution), which does not account for the uncertainty associated with estimating the population standard deviation. The z-test is appropriate when dealing with large sample sizes or situations where the population standard deviation is known.

### Q2: Differentiate between one-tailed and two-tailed tests.


| BASIC OF COMPARISON | ONE-TAILED TEST | TWO-TAILED TEST|
|------|-----------|-------------|
|Meaning | A statistical hypothesis test in which alternative hypothesis has only one end, is known as one tailed test.| A significance test in which alternative hypothesis has two ends, is called two-tailed test. |
| Hypothesis| Directional|Non-directional |
| Region of rejection|Either left or right |Both left and right |
|Determines |If there is a relationship between variables in single direction. | If there is a relationship between variables in either direction.|
|Result |Greater or less than certain value. | Greater or less than certain range of values.|

### Q3: Explain the concept of Type 1 and Type 2 errors in hypothesis testing. Provide an example scenario for each type of error.


#### Type I error

A Type I error means rejecting the null hypothesis when it’s actually true. It means concluding that results are statistically significant when, in reality, they came about purely by chance or because of unrelated factors.

**Example: Statistical significance and Type I error**

In your clinical study, you compare the symptoms of patients who received the new drug intervention or a control treatment. Using a t test, you obtain a p value of .035. This p value is lower than your alpha of .05, so you consider your results statistically significant and reject the null hypothesis.
However, the p value means that there is a 3.5% chance of your results occurring if the null hypothesis is true. Therefore, there is still a risk of making a Type I error.

#### Type II error

A Type II error means not rejecting the null hypothesis when it’s actually false. This is not quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell you whether to reject the null hypothesis.

**Example: Statistical power and Type II error**
When preparing your clinical study, you complete a power analysis and determine that with your sample size, you have an 80% chance of detecting an effect size of 20% or greater. An effect size of 20% means that the drug intervention reduces symptoms by 20% more than the control treatment.
However, a Type II may occur if an effect that’s smaller than this size. A smaller effect size is unlikely to be detected in your study due to inadequate statistical power.

### Q4: Explain Bayes's theorem with an example.


Ans - Bayes theorem, in simple words, determines the conditional probability of event A given that event B has already occurred based on the following:

* Probability of B given A
* Probability of A
* Probability of B

Bayes Law is a method to determine the probability of an event based on the occurrences of prior events. It is used to calculate conditional probability. Bayes theorem calculates the probability based on the hypothesis. Bayes rule states that the conditional probability of an event A, given the occurrence of another event B, is equal to the product of the likelihood of B, given A and the probability of A divided by the probability of B. It is given as:

**P(A|B) = P(B|A) P(A)/P(B)**

Example - finding out a patient’s probability of having liver disease if they are an alcoholic. “Being an alcoholic” is the test (kind of like a litmus test) for liver disease.

* A could mean the event “Patient has liver disease.” Past data tells you that 10% of patients entering your clinic have liver disease. P(A) = 0.10.
* B could mean the litmus test that “Patient is an alcoholic.” Five percent of the clinic’s patients are alcoholics. P(B) = 0.05.
* You might also know that among those patients diagnosed with liver disease, 7% are alcoholics. This is your B|A: the probability that a patient is alcoholic, given that they have liver disease, is 7%.

Bayes’ theorem tells you:

P(A|B) = (0.07 * 0.1)/0.05 = 0.14

In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14 (14%). This is a large increase from the 10% suggested by past data. But it’s still unlikely that any particular patient has liver disease.

### Q5: What is a confidence interval? How to calculate the confidence interval, explain with an example.


Ans - A confidence interval, in statistics, refers to the probability that a population parameter will fall between a set of values for a certain proportion of times. Analysts often use confidence intervals that contain either 95% or 99% of expected observations. Thus, if a point estimate is generated from a statistical model of 10.00 with a 95% confidence interval of 9.50 - 10.50, it can be inferred that there is a 95% probability that the true value falls within that range.

If you want to calculate a confidence interval on your own, you need to know:

1. The point estimate you are constructing the confidence interval for
2. The critical values for the test statistic
3. The standard deviation of the sample
4. The sample size

Once you know each of these components, you can calculate the confidence interval for your estimate by plugging them into the confidence interval formula that corresponds to your data.

Example: A random sample of 30 apples was taken from a large population. On measuring their diameter the mean diameter of the sample was 91 millimeters with a standard deviation of 8 mm. Calculate the 85% confidence limits for the mean diameter of the whole population of apples. 

![Screenshot 2023-05-25 201538.png](attachment:d78d04ef-ad1c-4a1f-8616-0d34edfc85ae.png)

**Answer: The 85% confidence limits are  = 91 ± 2.1**

### Q6. Use Bayes' Theorem to calculate the probability of an event occurring given prior knowledge of the event's probability and new evidence. Provide a sample problem and solution.


Ans - Suppose the probability of the weather being cloudy is 40%.

Also suppose the probability of rain on a given day is 20%.

Also suppose the probability of clouds on a rainy day is 85%. 

If it’s cloudy outside on a given day, what is the probability that it will rain that day?

In [1]:
# function for Bayes Theorem

def bayes(pA, pB,pBA):
    return pA * pBA / pB

# defining probabilities 

pRain = 0.2
pCloudy = 0.4
pCloudyRain = 0.85

# using the fuction to calculate probability

bayes(pRain, pCloudy, pCloudyRain)

0.425

**Result - This tells us that if it’s cloudy outside on a given day, the probability that it will rain that day is 0.425 or 42.5%.**

### Q7. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5. Interpret the results.


In [1]:
import scipy.stats as stats

mean = 50
std_dev = 5
sample_size = 100  # Number of data points in the sample
confidence_level = 0.95

# Calculate the standard error
std_error = std_dev / (sample_size ** 0.5)

# Calculate the margin of error
margin_of_error = stats.norm.ppf((1 + confidence_level) / 2) * std_error

# Calculate the lower and upper bounds of the confidence interval
lower_bound = mean - margin_of_error
upper_bound = mean + margin_of_error

# Print the confidence interval
print(f"Confidence Interval: [{lower_bound}, {upper_bound}]")


Confidence Interval: [49.02001800772997, 50.97998199227003]


### Q8. What is the margin of error in a confidence interval? How does sample size affect the margin of error? Provide an example of a scenario where a larger sample size would result in a smaller margin of error.


Ans - A margin of error tells you how many percentage points your results will differ from the real population value. The larger the sample, the smaller the margin of error. 

900 students were surveyed and had an average GPA of 2.7 with a standard deviation of 0.4. Calculate the margin of error for a 90% confidence level:

The critical value is 1.645

The standard deviation is 0.4 (from the question), but as this is a sample, we need the standard error for the mean. The formula for the SE of the mean is standard deviation / √(sample size), so: 0.4 / √(900) = 0.013.

1.645 * 0.013 = 0.021385


### Q9. Calculate the z-score for a data point with a value of 75, a population mean of 70, and a population standard deviation of 5. Interpret the results.


Ans ->>>>> mean (μ) = 70, std (σ) = 5, Xi = 75

z-score (z) = (X – μ) / σ


In [3]:
z = (75 - 70)/5

z
# from z table  value at Z(1) = 0.8413

1.0

**Result - It is 1 standard deviation away from the Mean.**

### Q10. In a study of the effectiveness of a new weight loss drug, a sample of 50 participants lost an average of 6 pounds with a standard deviation of 2.5 pounds. Conduct a hypothesis test to determine if the drug is significantly effective at a 95% confidence level using a t-test.

In [2]:
import math
sample_size = 50
sample_mean = 6
std = 2.5
alpha_value = 0.05


# Step 01: formulating the hypothesis.
print("""Null: There is no difference between sample mean and population mean i.e X̄ = U,
Alternative: There is a difference between sample mean and population mean i.e X̄ != U.""")

# Step 02: Claculating test-statistics.
degree_of_freedom = sample_size - 1
alpha_value = 0.05

#based on DOF and alpha value our t-critical value is :
t_critical_value = 1.6766
print('\nT-critical-value is: ', t_critical_value)

# t-test formula: t-test = (sample_mean - hypothesis_mean) / (std / √ sample_size)

t_test = (sample_mean - 0) / (std/math.sqrt(sample_size))

print('\nt-test value is: ', t_test)

# Comparison and conclusion:

print('\nt-test value > t-critical-value')
print('''\nOn comparing t-value and t-critical value we find that t-value is greater than t-critial value, therefore we reject the null hypothesis.''')


Null: There is no difference between sample mean and population mean i.e X̄ = U,
Alternative: There is a difference between sample mean and population mean i.e X̄ != U.

T-critical-value is:  1.6766

t-test value is:  16.970562748477143

t-test value > t-critical-value

On comparing t-value and t-critical value we find that t-value is greater than t-critial value, therefore we reject the null hypothesis.


### Q11. In a survey of 500 people, 65% reported being satisfied with their current job. Calculate the 95% confidence interval for the true proportion of people who are satisfied with their job.


In [3]:
import statsmodels.api as sm
import numpy as np

# Sample information
sample_size = 500
sample_proportion = 0.65

# Calculate the standard error
standard_error = np.sqrt((sample_proportion * (1 - sample_proportion)) / sample_size)

# Calculate the confidence interval
confidence_interval = sm.stats.proportion_confint(sample_proportion * sample_size, sample_size, alpha=0.05, method='normal')

# Print the confidence interval
print("95% Confidence Interval:", confidence_interval)


95% Confidence Interval: (0.6081925393809212, 0.6918074606190788)


### Q12. A researcher is testing the effectiveness of two different teaching methods on student performance. Sample A has a mean score of 85 with a standard deviation of 6, while sample B has a mean score of 82 with a standard deviation of 5. Conduct a hypothesis test to determine if the two teaching methods have a significant difference in student performance using a t-test with a significance level of 0.01.

In [4]:
import scipy.stats as stats
import statistics as s

s1_m = 85
s2_m = 82
std1 = 6
std2 = 5
alpha_value = 0.1

# step 01: Formulating the hypothesis.
print("""Null: There is no difference between sample01 mean and sample02 mean i.e Ud(mean difference) = 0,
Alternative: There is a difference between sample01 mean and sample02 mean i.e Ud(mean difference != 0).\n""")

# Step 02: Creating the samples using mean and std given in question. This sample will be used in t-test estimation.

a = s.NormalDist(mu = s1_m,sigma=std1)
samples_1 = a.samples(20, seed = 42)
print(samples_1,'\n')

b = s.NormalDist(mu = s2_m, sigma = std2)
samples_2 = b.samples(20, seed = 42)
print(samples_2)

# step 02: conducting test statistics.
t_test, p_value = stats.ttest_ind(a = samples_1, b = samples_2)

print(t_test, p_value)

# results comparision and conclusion:

if p_value < alpha_value:
    print('\nWe reject the null hypothesis.')
else:
    print('We fail to reject the null hypothesis.')
    
# conclusion:
print('We conclude that there is a difference in teaching method.')

Null: There is no difference between sample01 mean and sample02 mean i.e Ud(mean difference) = 0,
Alternative: There is a difference between sample01 mean and sample02 mean i.e Ud(mean difference != 0).

[84.13545802253243, 83.96257839801089, 84.33210483059402, 89.21190235059318, 84.23447029730268, 76.01587951395426, 86.9939100644063, 83.39597512901699, 83.69824789512883, 85.69530872020513, 86.39378642144032, 91.98135211959486, 88.93981904079202, 85.66304306466299, 80.57007038593107, 78.9120257950737, 86.4780531712672, 92.8664849632243, 85.2499411834203, 84.36206023737529] 

[81.27954835211035, 81.1354819983424, 81.44342069216168, 85.50991862549432, 81.36205858108556, 74.51323292829521, 83.66159172033858, 80.66331260751416, 80.91520657927403, 82.57942393350427, 83.16148868453361, 87.81779343299571, 85.28318253399334, 82.55253588721916, 78.3083919882759, 76.92668816256142, 83.23171097605601, 88.55540413602024, 82.20828431951692, 81.46838353114607]
2.627391423085882 0.012340270569595293


### Q13. A population has a mean of 60 and a standard deviation of 8. A sample of 50 observations has a mean of 65. Calculate the 90% confidence interval for the true population mean.


In [6]:
import numpy as np
from scipy import stats

# Population information
population_mean = 60
population_std = 8


# Sample information
sample_mean = 65
sample_size = 50

# Calculate the standard error
standard_error = population_std / np.sqrt(sample_size)

# Calculate the margin of error
margin_of_error = stats.norm.ppf(0.95) * standard_error

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the confidence interval
print("90% Confidence Interval:", confidence_interval)


90% Confidence Interval: (63.13906055411732, 66.86093944588268)


### Q14. In a study of the effects of caffeine on reaction time, a sample of 30 participants had an average reaction time of 0.25 seconds with a standard deviation of 0.05 seconds. Conduct a hypothesis test to determine if the caffeine has a significant effect on reaction time at a 90% confidence level using a t-test.

In [8]:
import math
sample_size = 30
sample_mean = 0.25 
std = 0.05
alpha_value = 0.1
hypothesis_mean = 0

# step 01: Formulating the hypothesis.

print("""NUll: There is no significant effect of caffeine on reaction time.
Alternative: There is significant effect of caffeine on reaction time.\n""")

# step 02: conducting test-statistics.
dof = sample_size - 1
critical_value = 1.311

t_test = (sample_mean - hypothesis_mean) / (std/(math.sqrt(sample_size))) 

print('T-test score is:',t_test)
print('\nCritical_value is: ', critical_value)

# comparing results.

print("""\nOn comparing t-test score and critical_value value we observe that t-test value is greater than critical_value value which means we reject the null hypothesis.
Therefore with 90% confidence level we reject the null hypothesis that there is significant effect of caffeine on reaction time.""")


NUll: There is no significant effect of caffeine on reaction time.
Alternative: There is significant effect of caffeine on reaction time.

T-test score is: 27.386127875258307

Critical_value is:  1.311

On comparing t-test score and critical_value value we observe that t-test value is greater than critical_value value which means we reject the null hypothesis.
Therefore with 90% confidence level we reject the null hypothesis that there is significant effect of caffeine on reaction time.
