In [1]:
#answer:-1

"""
ANOVA (Analysis of Variance) requires three key assumptions:

1---Independence of observations: Each observation in the dataset should be independent of others. A violation could occur if the data points are related, like in repeated measurements.
2---Normality: The residuals (errors) in each group should be approximately normally distributed. Violations include skewed data, which can lead to incorrect F-statistics.
3---Homogeneity of variances (Homoscedasticity): All groups should have similar variances. If violated (heteroscedasticity), it could lead to unreliable results, especially in unequal sample sizes."""

'\nANOVA (Analysis of Variance) requires three key assumptions:\n\n1---Independence of observations: Each observation in the dataset should be independent of others. A violation could occur if the data points are related, like in repeated measurements.\n2---Normality: The residuals (errors) in each group should be approximately normally distributed. Violations include skewed data, which can lead to incorrect F-statistics.\n3---Homogeneity of variances (Homoscedasticity): All groups should have similar variances. If violated (heteroscedasticity), it could lead to unreliable results, especially in unequal sample sizes.'

In [2]:
#answer:-2

"""
1---One-way ANOVA: Used when comparing the means of three or more groups based on one factor (e.g., comparing the effectiveness of three diets on weight loss).
2---Two-way ANOVA: Applied when analyzing the effect of two independent factors (e.g., comparing the effect of diet and exercise type on weight loss).
3---Repeated Measures ANOVA: Used when the same subjects are measured multiple times under different conditions (e.g., measuring the impact of medication on patients over several months)."""

'\n1---One-way ANOVA: Used when comparing the means of three or more groups based on one factor (e.g., comparing the effectiveness of three diets on weight loss).\n2---Two-way ANOVA: Applied when analyzing the effect of two independent factors (e.g., comparing the effect of diet and exercise type on weight loss).\n3---Repeated Measures ANOVA: Used when the same subjects are measured multiple times under different conditions (e.g., measuring the impact of medication on patients over several months).'

In [3]:
#answer:-3

"""
In ANOVA, variance is partitioned into two components:

1---Between-group variance: Variance due to differences between the group means.
2----Within-group variance: Variance within each group. Understanding this helps determine if the differences between group means are greater than the natural variability within the groups, which is critical in determining the significance of the results."""

'\nIn ANOVA, variance is partitioned into two components:\n\n1---Between-group variance: Variance due to differences between the group means.\n2----Within-group variance: Variance within each group. Understanding this helps determine if the differences between group means are greater than the natural variability within the groups, which is critical in determining the significance of the results.'

In [4]:
#answewr:-4

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
df = pd.DataFrame({'Group': ['A']*10 + ['B']*10 + ['C']*10,
                   'Values': np.random.randn(30)})

model = ols('Values ~ C(Group)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


             sum_sq    df         F   PR(>F)
C(Group)   3.614363   2.0  2.075719  0.14503
Residual  23.506987  27.0       NaN      NaN


In [5]:
#answer:-5

# Example: Two-way ANOVA with interaction
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = pd.DataFrame({'Factor1': np.random.choice(['A', 'B'], 30),
                     'Factor2': np.random.choice(['X', 'Y'], 30),
                     'Values': np.random.randn(30)})

model = ols('Values ~ C(Factor1) * C(Factor2)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


                          sum_sq    df         F    PR(>F)
C(Factor1)              0.002283   1.0  0.002599  0.959727
C(Factor2)              0.699469   1.0  0.796392  0.380360
C(Factor1):C(Factor2)   0.261879   1.0  0.298167  0.589689
Residual               22.835710  26.0       NaN       NaN


In [6]:
#answer:-6
"""An F-statistic of 5.23 with a p-value of 0.02 indicates that there is a statistically significant difference between the group means at a 5% significance level.
Since the p-value is less than 0.05, we reject the null hypothesis, meaning at least one group mean is different from the others."""

'An F-statistic of 5.23 with a p-value of 0.02 indicates that there is a statistically significant difference between the group means at a 5% significance level. \nSince the p-value is less than 0.05, we reject the null hypothesis, meaning at least one group mean is different from the others.'

In [7]:
#answer:-7
"""Missing data in repeated measures ANOVA can be handled using methods like listwise deletion (removing cases with missing data), mean imputation (replacing missing values with the mean), or multiple imputation.
The choice of method affects the accuracy and reliability of the results, with listwise deletion potentially reducing statistical power, while imputation methods can introduce bias if not done carefully."""

'Missing data in repeated measures ANOVA can be handled using methods like listwise deletion (removing cases with missing data), mean imputation (replacing missing values with the mean), or multiple imputation. \nThe choice of method affects the accuracy and reliability of the results, with listwise deletion potentially reducing statistical power, while imputation methods can introduce bias if not done carefully.'

In [8]:
#answer:-8
"""Common post-hoc tests include:

1---Tukey's HSD: Used when comparing all possible pairwise differences between groups.
2---Bonferroni correction: A more conservative approach when many comparisons are being made.
3---Scheffé’s test: More flexible, used when testing complex comparisons."""

"Common post-hoc tests include:\n\nTukey's HSD: Used when comparing all possible pairwise differences between groups.\nBonferroni correction: A more conservative approach when many comparisons are being made.\nScheffé’s test: More flexible, used when testing complex comparisons."

In [9]:
#answer:-9
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = pd.DataFrame({'Diet': ['A']*50 + ['B']*50 + ['C']*50,
                     'WeightLoss': np.random.randn(150)})

model = ols('WeightLoss ~ C(Diet)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


              sum_sq     df         F    PR(>F)
C(Diet)     0.145480    2.0  0.057508  0.944135
Residual  185.935715  147.0       NaN       NaN


In [10]:
#answer:-10
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = pd.DataFrame({'Program': ['A', 'B', 'C']*30,
                     'Experience': ['Novice', 'Experienced']*45,
                     'Time': np.random.randn(90)})

model = ols('Time ~ C(Program) * C(Experience)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


                             sum_sq    df         F    PR(>F)
C(Program)                 2.519523   2.0  1.181890  0.311746
C(Experience)              1.013424   1.0  0.950780  0.332321
C(Program):C(Experience)   1.009852   2.0  0.473714  0.624338
Residual                  89.534523  84.0       NaN       NaN


In [11]:
#answer:-11
import scipy.stats as stats

control_group = [72, 68, 75, 70, 80]  # Example data
experimental_group = [78, 82, 85, 80, 88]

t_stat, p_value = stats.ttest_ind(control_group, experimental_group)
print(f"T-statistic: {t_stat}, P-value: {p_value}")


T-statistic: -3.491486243775876, P-value: 0.008180565909760731


In [12]:
#answer:-12
from statsmodels.stats.anova import AnovaRM

# Example data
df = pd.DataFrame({
    'Store': ['A']*30 + ['B']*30 + ['C']*30,
    'Day': list(range(30))*3,
    'Sales': np.random.randn(90)
})

# Conducting repeated measures ANOVA
anova_model = AnovaRM(df, 'Sales', 'Day', within=['Store']).fit()
print(anova_model)


               Anova
      F Value Num DF  Den DF Pr > F
-----------------------------------
Store  1.0999 2.0000 58.0000 0.3397

