In [None]:
Q1. Assumptions Required for ANOVA and Examples of Violations
Assumptions:
1.	Independence: Observations must be independent of each other.
2.	Normality: The residuals (errors) of the model should be approximately normally distributed.
3.	Homogeneity of Variances: The variances among the groups should be approximately equal.
Examples of Violations:
•	Independence: If data points are collected from related subjects (e.g., repeated measurements on the same subjects), the independence assumption is violated.
•	Normality: If residuals are heavily skewed or have outliers, the normality assumption is violated.
•	Homogeneity of Variances: If one group has much larger variance compared to others, the assumption is violated. This can be tested using Levene's test.
Q2. Types of ANOVA
1.	One-Way ANOVA: Used when comparing means across one factor with multiple levels. Example: Comparing average test scores among different teaching methods.
2.	Two-Way ANOVA: Used when comparing means across two factors simultaneously. Example: Comparing the effects of different diets and exercise levels on weight loss.
3.	Repeated Measures ANOVA: Used when the same subjects are measured multiple times under different conditions. Example: Measuring students' performance before and after an intervention.
Q3. Partitioning of Variance in ANOVA
Partitioning of Variance:
•	Total Sum of Squares (SST): Measures the total variation in the data.
•	Explained Sum of Squares (SSE): Measures the variation explained by the model (between groups).
•	Residual Sum of Squares (SSR): Measures the variation not explained by the model (within groups).
Importance: Understanding partitioning of variance helps in determining how much of the total variability in the data is explained by the differences between groups versus the variability within groups.
Q4. Calculating SST, SSE, and SSR in Python
import numpy as np
import pandas as pd
from scipy import stats

# Example data
data = pd.DataFrame({
    'group': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'value': [23, 21, 25, 30, 32, 28, 22, 25, 27]
})

# Calculate SST (Total Sum of Squares)
mean_total = data['value'].mean()
sst = ((data['value'] - mean_total) ** 2).sum()

# Calculate SSE (Explained Sum of Squares)
group_means = data.groupby('group')['value'].mean()
sse = ((group_means - mean_total) ** 2 * data['group'].value_counts()).sum()

# Calculate SSR (Residual Sum of Squares)
residuals = data['value'] - data.groupby('group')['value'].transform('mean')
ssr = (residuals ** 2).sum()

print(f"SST: {sst}, SSE: {sse}, SSR: {ssr}")
Q5. Main Effects and Interaction Effects in Two-Way ANOVA
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = pd.DataFrame({
    'software': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'] * 10,
    'experience': ['novice', 'experienced'] * 15,
    'time': np.random.rand(30) * 10
})

# Fit the model
model = ols('time ~ C(software) * C(experience)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)
Q6. Interpretation of ANOVA Results
With an F-statistic of 5.23 and a p-value of 0.02:
•	Conclusion: There is a statistically significant difference between at least two groups, as the p-value is less than the typical alpha level of 0.05.
•	Interpretation: You would need to perform a post-hoc test to determine which specific groups differ significantly from each other.
Q7. Handling Missing Data in Repeated Measures ANOVA
Handling Methods:
1.	Listwise Deletion: Exclude all cases with missing data. Can reduce sample size and power.
2.	Imputation: Replace missing values with estimated values. Can introduce bias if the imputation method is not suitable.
3.	Mixed-Effects Models: Account for missing data within the model. Suitable for complex cases but requires careful implementation.
Consequences:
•	Listwise Deletion: May lead to loss of data and power.
•	Imputation: Risk of introducing bias.
•	Mixed-Effects Models: Complex but often most robust approach.
Q8. Common Post-Hoc Tests
1.	Tukey's Honestly Significant Difference (HSD): Compares all pairs of groups while controlling the family-wise error rate.
2.	Bonferroni Correction: Adjusts p-values to account for multiple comparisons.
3.	Scheffé's Test: Useful for testing all possible contrasts, not just pairwise comparisons.
Example: After finding a significant difference in a one-way ANOVA comparing the effects of different treatments, Tukey's HSD can identify which specific treatments differ from each other.
Q9. One-Way ANOVA for Weight Loss
import numpy as np
import pandas as pd
from scipy import stats

# Example data
data = pd.DataFrame({
    'diet': ['A'] * 17 + ['B'] * 17 + ['C'] * 16,
    'weight_loss': np.random.rand(50) * 10
})

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(
    data.loc[data['diet'] == 'A', 'weight_loss'],
    data.loc[data['diet'] == 'B', 'weight_loss'],
    data.loc[data['diet'] == 'C', 'weight_loss']
)

print(f"F-statistic: {f_stat}, p-value: {p_value}")
Interpretation: If the p-value is less than 0.05, there are significant differences between at least two of the diets. Post-hoc tests would be needed to identify which specific diets differ.
Q10. Two-Way ANOVA for Software Programs and Experience
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = pd.DataFrame({
    'program': ['A', 'B', 'C'] * 10,
    'experience': ['novice'] * 15 + ['experienced'] * 15,
    'time': np.random.rand(30) * 10
})

# Fit the model
model = ols('time ~ C(program) * C(experience)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)
Interpretation: The ANOVA table will show whether there are significant main effects of the software programs and experience levels, and if there is an interaction effect between them.
Q11. Two-Sample T-Test for Teaching Method
import numpy as np
from scipy import stats

# Example data
control_group = np.random.rand(50) * 10
experimental_group = np.random.rand(50) * 10 + 1

# Perform t-test
t_stat, p_value = stats.ttest_ind(control_group, experimental_group)

print(f"T-statistic: {t_stat}, p-value: {p_value}")
Post-Hoc Test: If significant, post-hoc tests are not typically needed for a two-sample t-test, but additional analyses could explore other factors or comparisons if necessary.
Q12. Repeated Measures ANOVA for Retail Stores
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
np.random.seed(0)
data = pd.DataFrame({
    'store': ['A'] * 10 + ['B'] * 10 + ['C'] * 10,
    'day': np.tile(np.arange(10), 3),
    'sales': np.random.rand(30) * 100
})

# Fit the model
model = ols('sales ~ C(store)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

