In [None]:
Ans 1 
ANOVA (Analysis of Variance) is a statistical test used to compare the means of three or more groups to determine
if there are significant differences between them. To use ANOVA, certain assumptions must be met for the results to be valid.
These assumptions are as follows:

1. Independence: The observations within each group must be independent of each other. In other words, the values in one group 
should not be influenced by the values in another group.

2. Normality: The data within each group should be approximately normally distributed. This means that the distribution of values
within each group should resemble a bell-shaped curve.

3. Homogeneity of variances: The variances of the groups being compared should be roughly equal. This assumption is known as homoscedasticity.
If the variances differ significantly, it can affect the validity of the ANOVA results.

Violations of these assumptions can impact the validity of the ANOVA results. Here are examples of violations and their potential impacts:

1. Violation of independence: If the observations within groups are not independent, such as when there is a correlation or dependence
between the values in different groups, it can lead to biased or inflated results. For example, if the same individuals are measured in
multiple groups, violating the assumption of independence, the results may be unreliable.

2. Violation of normality: If the data within groups are not normally distributed, it can affect the accuracy of the p-values and
confidence intervals calculated by ANOVA. For instance, if the data are heavily skewed or have extreme outliers, the assumption of normality 
may be violated. In such cases, transformations or non-parametric tests may be more appropriate.

3. Violation of homogeneity of variances: When the variances of the groups being compared are significantly different, it can impact 
the statistical power of the ANOVA and lead to incorrect conclusions. If the assumption of homogeneity of variances is violated,
it is advisable to use alternative statistical tests like Welch's ANOVA or non-parametric tests.

It is important to check these assumptions before applying ANOVA and, if violated, consider alternative methods or transformations
to ensure the validity of the statistical analysis.

In [None]:
Ans 2
The three types of ANOVA are:

One-Way ANOVA: This type of ANOVA is used when you have one categorical independent variable (factor) and one continuous dependent variable.
It is used to determine if there are significant differences in the means of three or more groups. For example, you might use a one-way ANOVA 
to compare the average test scores of students across different schools.

Two-Way ANOVA: This type of ANOVA is used when you have two categorical independent variables and one continuous dependent variable. 
It is used to examine the effects of two factors on the mean differences. For example, you might use a two-way ANOVA to investigate 
the effects of both gender and age group on the average income of individuals.

Mixed-Design ANOVA: This type of ANOVA is used when you have both categorical and continuous independent variables, and 
one continuous dependent variable. It is used to analyze the interaction effects between independent variables and the main effects of each variable.
Mixed-design ANOVA allows for studying within-subjects (repeated measures) and between-subjects factors simultaneously.
For example, you might use a mixed-design ANOVA to examine the effects of a training program (within-subjects factor) and
gender (between-subjects factor) on the performance scores of participants.

In [None]:
Ans 3
The partitioning of variance in ANOVA refers to the division of the total variability observed in the data into different sources of variation.
It is an essential concept in ANOVA as it helps to understand and quantify the contributions of different factors to the overall variability.
By partitioning the total variability into these two components, ANOVA provides a statistical framework to evaluate whether the observed 
differences between groups are statistically significant or merely due to random variation. It helps to determine if the variation among 
the group means (between-group variability) is greater than the expected random variation within the groups (within-group variability).

In [None]:
Ans 4 
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
group1 = [1, 2, 3, 4, 5]
group2 = [6, 7, 8, 9, 10]
group3 = [11, 12, 13, 14, 15]

data = np.concatenate([group1, group2, group3])
groups = np.array(["Group 1"] * len(group1) + ["Group 2"] * len(group2) + ["Group 3"] * len(group3))

# Fit one-way ANOVA model
model = ols('data ~ groups', data=data).fit()

# Calculate total sum of squares (SST)
mean_total = np.mean(data)
SST = np.sum((data - mean_total) ** 2)

# Calculate explained sum of squares (SSE)
mean_group = model.params[1:]
SSE = np.sum(len(group) * (mean - mean_total) ** 2 for group, mean in zip([group1, group2, group3], mean_group))

# Calculate residual sum of squares (SSR)
SSR = SST - SSE

print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)


In [None]:
Ans 5 
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
group1 = [1, 2, 3, 4, 5]
group2 = [6, 7, 8, 9, 10]
group3 = [11, 12, 13, 14, 15]
group4 = [16, 17, 18, 19, 20]

data = np.concatenate([group1, group2, group3, group4])
factor1 = np.array(["A"] * len(group1) + ["A"] * len(group2) + ["B"] * len(group3) + ["B"] * len(group4))
factor2 = np.array(["X"] * len(group1) + ["Y"] * len(group2) + ["X"] * len(group3) + ["Y"] * len(group4))

df = pd.DataFrame({"data": data, "factor1": factor1, "factor2": factor2})

# Fit two-way ANOVA model
model = ols('data ~ factor1 + factor2 + factor1:factor2', data=df).fit()

# Extract main effects and interaction effect
main_effect_factor1 = model.params["factor1[T.B]"]
main_effect_factor2 = model.params["factor2[T.Y]"]
interaction_effect = model.params["factor1[T.B]:factor2[T.Y]"]

print("Main effect of Factor 1:", main_effect_factor1)
print("Main effect of Factor 2:", main_effect_factor2)
print("Interaction effect:", interaction_effect)



In [None]:
Ans6 
In the given scenario, you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. 
Based on these results, you can draw the following conclusions:

1. Differences between groups: The obtained F-statistic of 5.23 indicates that there are significant differences between the groups being compared. 
The F-statistic is a ratio of the between-group variability to the within-group variability. A larger F-statistic suggests that the variation
between the group means is greater than the expected random variation within the groups.

2. Statistical significance: The p-value of 0.02 indicates that the probability of observing such a large F-statistic by chance, 
assuming no true differences between the groups (null hypothesis), is 0.02 or 2%. Typically, if the p-value is below a pre-determined 
significance level (e.g., 0.05), we reject the null hypothesis. In this case, since the p-value (0.02) is less than the significance level, 
we reject the null hypothesis and conclude that there are significant differences between the groups.

3. Interpretation: The significant results suggest that at least one group mean is different from the others. However, the ANOVA itself does not 
indicate which specific groups differ from each other. To determine the specific group differences, further post-hoc tests, such as 
Tukey's Honestly Significant Difference (HSD) test or pairwise t-tests, can be conducted.

In summary, the F-statistic of 5.23 with a p-value of 0.02 suggests that there are significant differences between the groups. 
It indicates that the observed differences among the group means are unlikely to be due to random chance alone. However, additional post-hoc 
tests are required to identify the specific group differences.

In [None]:
Ans 7 
Handling missing data in a repeated measures ANOVA requires careful consideration to ensure unbiased and valid results. 
Here are some common methods for dealing with missing data in this context:

Complete Case Analysis (CCA): This approach involves excluding participants with missing data from the analysis.
It only uses data from subjects with complete observations across all time points or conditions. The consequences of using CCA are 
a potential loss of statistical power and potential bias if the missingness is related to the outcome or other variables of interest.

Pairwise Deletion: With this method, you include all available data points for each participant and time point but analyze each time point separately.
It handles missing data by omitting incomplete cases for each specific time point. The potential consequences are increased power compared 
to CCA but possible loss of efficiency and reduced precision in estimates.

Mean Imputation: This method replaces missing values with the mean of the observed values for that variable. However, mean imputation can 
lead to an underestimation of standard errors, biased estimates, and artificially reduced variance.

In [None]:
Ans 8
After conducting an ANOVA and finding a significant overall effect, post-hoc tests are often used to determine specific group differences. 
Several common post-hoc tests are available, and the choice depends on the research question and assumptions. Here are some commonly 
used post-hoc tests:

Tukey's Honestly Significant Difference (HSD) test: This test compares all possible pairs of group means and controls for
Type I error rate inflation. It is used when you have equal sample sizes and want to identify which specific group means differ significantly
from each other.

Bonferroni correction: This method adjusts the significance level (alpha) for multiple pairwise comparisons to maintain an overall alpha level. 
It is a conservative approach and suitable when conducting a large number of pairwise comparisons.

Scheffe's test: This test is a more conservative approach that allows for more flexibility in the number and nature of comparisons. 
It is suitable when the number of groups or comparisons is relatively small and when the assumption of equal variances may be violated.

In [None]:
Ans 9
import numpy as np
from scipy.stats import f_oneway

# Example data
diet_A = [2, 4, 5, 6, 3, 4, 6, 7, 8, 3, 5, 6, 4, 5, 3, 2, 4, 5, 6, 7, 3, 4, 5, 6, 4, 5, 4, 3, 5, 6, 4, 5, 4, 5, 3, 5, 6, 4, 5, 4, 3, 5, 6, 4, 5, 4, 3, 5, 6]
diet_B = [1, 2, 1, 3, 2, 1, 3, 1, 2, 1, 2, 1, 3, 2, 1, 2, 1, 3, 2, 1, 2, 1, 3, 2, 1, 2, 1, 3, 2, 1, 2, 1, 3, 2, 1, 2, 1, 3, 2, 1, 2, 1, 3, 2, 1, 2, 1, 3]
diet_C = [3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(diet_A, diet_B, diet_C)

print("F-statistic:", f_statistic)
print("p-value:", p_value)

After running the code, you will obtain the F-statistic and p-value. Let's assume the results are as follows:

F-statistic: 3.426
p-value: 0.041

Interpreting the results:

The obtained F-statistic is 3.426, and the associated p-value is 0.041. Since the p-value (0.041) is less than 
the commonly used significance level of 0.05, we reject the null hypothesis. This indicates that there is evidence of 
a significant difference in the mean weight loss among the three diets (A






In [None]:
Ans 10
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
programs = np.repeat(["A", "B", "C"], 10)
experience = np.tile(["Novice", "Experienced"], 15)
times = np.random.normal(loc=10, scale=2, size=30)  # Example time data

df = pd.DataFrame({"Program": programs, "Experience": experience, "Time": times})

# Fit two-way ANOVA model
model = ols('Time ~ Program + Experience + Program:Experience', data=df).fit()
anova_table = sm.stats.anova_lm(model)

# Extract F-statistics and p-values
f_program = anova_table.loc["Program", "F"]
p_program = anova_table.loc["Program", "PR(>F)"]
f_experience = anova_table.loc["Experience", "F"]
p_experience = anova_table.loc["Experience", "PR(>F)"]
f_interaction = anova_table.loc["Program:Experience", "F"]
p_interaction = anova_table.loc["Program:Experience", "PR(>F)"]

print("Program - F-statistic:", f_program)
print("Program - p-value:", p_program)
print("Experience - F-statistic:", f_experience)
print("Experience - p-value:", p_experience)
print("Interaction - F-statistic:", f_interaction)
print("Interaction - p-value:", p_interaction)


In [None]:
ANs 11
import numpy as np
from scipy.stats import ttest_ind, ttest_posthoc

# Example data
control_group = np.random.normal(loc=70, scale=10, size=100)  # Example scores for control group
experimental_group = np.random.normal(loc=75, scale=10, size=100)  # Example scores for experimental group

# Perform two-sample t-test
t_statistic, p_value = ttest_ind(control_group, experimental_group)

print("T-statistic:", t_statistic)
print("p-value:", p_value)

# Perform post-hoc test (e.g., Tukey's HSD) if significant differences found
if p_value < 0.05:
    posthoc = ttest_posthoc([control_group, experimental_group], method='tukey')
    print("Post-hoc test results:")
    print(posthoc)


In [None]:
Ans 12
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Example data
store_A = np.random.normal(loc=100, scale=10, size=30)  # Example sales for Store A
store_B = np.random.normal(loc=110, scale=10, size=30)  # Example sales for Store B
store_C = np.random.normal(loc=90, scale=10, size=30)  # Example sales for Store C
days = np.arange(30)  # Days of observation

df = pd.DataFrame({"Store_A": store_A, "Store_B": store_B, "Store_C": store_C, "Day": days})

# Perform repeated measures ANOVA
model = ols('Store_A + Store_B + Store_C ~ Day', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Extract p-value
p_value = anova_table.loc["Day", "PR(>F)"]

print("p-value:", p_value)

# Perform post-hoc test (e.g., Tukey's HSD) if significant differences found
if p_value < 0.05:
    data = df.melt(id_vars="Day", value_vars=["Store_A", "Store_B", "Store_C"], var_name="Store")
    posthoc = pairwise_tukeyhsd(data["value"], data["Store"])
    print("Post-hoc test results:")
    print(posthoc)
