## ✅ Topic 3: Inferential Statistics

Using the **Social Media Addiction vs Relationships** dataset, this notebook demonstrates key inferential statistics concepts through **hypothesis testing**, confidence intervals, and conclusions drawn from sample data.

---

## 🎯 Goals:

* Estimate population parameters using sample data.
* Construct **confidence intervals**.
* Perform **hypothesis testing**:

  * **One-sample t-test**
  * **Two-sample t-test**
  * **Chi-square test**
  * **One-way ANOVA**
  * **F-test for variance**
* Interpret p-values and statistical significance.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [2]:
#!/bin/bash
!curl -L -o social-media-addiction-vs-relationships.zip\
  https://www.kaggle.com/api/v1/datasets/download/adilshamim8/social-media-addiction-vs-relationships
!unzip social-media-addiction-vs-relationships.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  7851  100  7851    0     0  20785      0 --:--:-- --:--:-- --:--:-- 20785
Archive:  social-media-addiction-vs-relationships.zip
  inflating: Students Social Media Addiction.csv  


In [3]:
# Load dataset
df = pd.read_csv("Students Social Media Addiction.csv")

In [4]:
df['Affects_Academic_Performance'] = df['Affects_Academic_Performance'].map({'Yes': 1, 'No': 0})

### 📌 1. Confidence Interval – Average Daily Usage

In [5]:
# Mean and 95% confidence interval
usage = df['Avg_Daily_Usage_Hours'].dropna()
mean = usage.mean()
std_err = stats.sem(usage)
ci = stats.t.interval(0.95, len(usage)-1, loc=mean, scale=std_err)

In [6]:
print(f"📌 95% Confidence Interval for Avg Usage: {ci}")

📌 95% Confidence Interval for Avg Usage: (np.float64(4.825747077363743), np.float64(5.011699731146895))


### 📌 2. One-sample t-test

**Q: Is the average daily usage significantly different from 4 hours?**



In [7]:
t_stat, p_val = stats.ttest_1samp(usage, 4)
print(f"t-statistic: {t_stat:.3f}, p-value: {p_val:.3f}")

t-statistic: 19.400, p-value: 0.000


In [8]:
alpha = 0.05
if p_val < alpha:
    print("✅ Reject H₀ – significant difference from 4 hours.")
else:
    print("❌ Fail to reject H₀ – no significant difference.")

✅ Reject H₀ – significant difference from 4 hours.


### 📌 3. Two-sample t-test – Gender vs Addicted Score

In [9]:
addicted_m = df[df['Gender'] == 'Male']['Addicted_Score']
addicted_f = df[df['Gender'] == 'Female']['Addicted_Score']

In [10]:
t_stat, p_val = stats.ttest_ind(addicted_m, addicted_f, equal_var=False)

print(f"t-statistic: {t_stat:.3f}, p-value: {p_val:.3f}")

t-statistic: -1.319, p-value: 0.187


### 📌 4. F-test – Variance Comparison

In [11]:
var_m = np.var(addicted_m, ddof=1)
var_f = np.var(addicted_f, ddof=1)
f_stat = max(var_m, var_f) / min(var_m, var_f)
df1, df2 = len(addicted_m)-1, len(addicted_f)-1
p_val = 2 * min(stats.f.cdf(f_stat, df1, df2), 1 - stats.f.cdf(f_stat, df1, df2))

In [12]:
print(f"F-statistic: {f_stat:.3f}, p-value: {p_val:.3f}")

F-statistic: 1.384, p-value: 0.002


### 📌 5. Chi-square Test – Academic Performance vs Relationship Status

In [13]:
contingency = pd.crosstab(df['Relationship_Status'], df['Affects_Academic_Performance'])
chi2, p, dof, expected = stats.chi2_contingency(contingency)

In [14]:
print(f"Chi-square statistic: {chi2:.2f}, p-value: {p:.3f}")

Chi-square statistic: 22.71, p-value: 0.000


### 📌 6. One-Way ANOVA – Avg Usage vs Academic Level

In [15]:
groups = [group['Avg_Daily_Usage_Hours'].dropna()
          for name, group in df.groupby('Academic_Level')]

In [16]:
f_stat, p_val = stats.f_oneway(*groups)
print(f"F-statistic: {f_stat:.3f}, p-value: {p_val:.3f}")

F-statistic: 6.265, p-value: 0.002


### ✅ Summary Table of Tests

| Test                | Question Addressed                                       | Output |
| ------------------- | -------------------------------------------------------- | ------ |
| Confidence Interval | What's the likely range for average usage?               | 95% CI |
| One-sample t-test   | Is usage ≠ 4 hours?                                      | t, p   |
| Two-sample t-test   | Do males & females differ in Addicted Score?             | t, p   |
| F-test              | Do they differ in variance of Addicted Score?            | F, p   |
| Chi-square test     | Is academic performance affected by relationship status? | χ², p  |
| ANOVA               | Does usage differ by academic level?                     | F, p   |

---

### 🧠 Conclusion

* Inferential statistics helps generalize findings from samples to populations.
* Statistical tests guide whether patterns are **significant or by chance**.
* Understanding p-values, confidence levels, and test selection is key to analytics.