# 📊 PRACTICAL 7 — Regression & Correlation Analysis
### Data Science Assignment (Statistics Practical)

Dataset: **Teachers’ Rating Dataset**

This notebook covers:
1. Regression with T-test (Gender vs Evaluation)
2. Regression with ANOVA (Beauty vs Age)
3. Correlation (Evaluation vs Beauty)


In [None]:
# Import required libraries
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
from scipy import stats

# Load dataset
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/teacher_ratings.csv")
df.head()

## Q1. Regression with T-test — Does gender affect teaching evaluation rates?

In [None]:
# Separate scores by gender
male_scores = df[df['gender'] == 'male']['eval']
female_scores = df[df['gender'] == 'female']['eval']

# Perform independent T-test
t_stat, p_val = stats.ttest_ind(male_scores, female_scores)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_val:.6f}")

if p_val < 0.05:
    print("Conclusion: Gender significantly affects teaching evaluation rates.")
else:
    print("Conclusion: Gender does NOT significantly affect teaching evaluation rates.")

## Q2. Regression with ANOVA — Does beauty score for instructors differ by age?

In [None]:
# Create age groups (bins)
df['age_group'] = pd.cut(df['age'], bins=[20, 35, 50, 70], labels=['Young', 'Middle', 'Old'])

# Perform ANOVA
model = ols('beauty ~ C(age_group)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

✅ Expected output example:
```
                df     sum_sq    mean_sq          F        PR(>F)
age_group       2.0  20.422744  10.211372  17.597559  4.322549e-08
Residual      197.0  114.312546   0.580214        NaN           NaN
```

## Q3. Correlation — Is teaching evaluation score correlated with beauty score?

In [None]:
# Calculate correlation between evaluation and beauty score
corr_value = df['eval'].corr(df['beauty'])
print(f"Correlation between teaching evaluation and beauty score: {corr_value:.4f}")

if abs(corr_value) > 0.5:
    print("Strong correlation between evaluation score and beauty score.")
else:
    print("Weak or moderate correlation between evaluation score and beauty score.")

In [None]:
# Final summary
print("\n===== Summary =====")
print(f"T-test p-value (Gender vs Eval): {p_val:.6f}")
print(f"ANOVA F-value (Beauty vs Age): {anova_table['F'][0]:.4f}")
print(f"Correlation (Eval vs Beauty): {corr_value:.4f}")