# 1. Descriptive Statistics
**Descriptive statistics summarize the main features of a dataset. Common measures include mean, median, mode, variance, standard deviation, and range.**

In [1]:
import pandas as pd
import numpy as np

# Example dataframe
df = pd.DataFrame({
    'age': [23, 25, 31, 35, 28],
    'salary': [50000, 62000, 75000, 80000, 60000]
})

# Descriptive Statistics
mean = df['salary'].mean()  # Mean
median = df['salary'].median()  # Median
mode = df['salary'].mode()  # Mode
std = df['salary'].std()  # Standard Deviation
var = df['salary'].var()  # Variance
min_value = df['salary'].min()  # Minimum
max_value = df['salary'].max()  # Maximum
range_value = max_value - min_value  # Range

# Summary statistics
summary = df.describe()

# Correlation and Covariance
corr = df.corr()  # Correlation matrix
cov = df.cov()  # Covariance matrix


# 2. Probability Distributions
**Probability distributions describe how the values of a variable are distributed. Common distributions include normal (Gaussian), binomial, and Poisson.**

In [2]:
from scipy.stats import norm, binom, poisson

# Normal Distribution
mu, sigma = 0, 1  # mean and standard deviation
normal_dist = norm.rvs(size=1000, loc=mu, scale=sigma)

# Binomial Distribution
n, p = 10, 0.5  # number of trials, probability of success
binomial_dist = binom.rvs(n, p, size=1000)

# Poisson Distribution
lambda_ = 3  # rate (lambda) for Poisson
poisson_dist = poisson.rvs(mu=lambda_, size=1000)

# Probability Density Function (PDF) for Normal
pdf_value = norm.pdf(0, loc=mu, scale=sigma)

# Cumulative Distribution Function (CDF)
cdf_value = norm.cdf(0, loc=mu, scale=sigma)


# 3. Confidence Intervals
**Confidence intervals give a range in which the true population parameter is likely to lie with a given level of confidence (e.g., 95%).**

In [3]:
import numpy as np
import scipy.stats as stats

# Sample data
data = np.array([23, 25, 31, 35, 28])
confidence = 0.95

# Mean and standard error of the mean
mean = np.mean(data)
sem = stats.sem(data)

# Confidence interval
confidence_interval = stats.t.interval(confidence, len(data)-1, loc=mean, scale=sem)


# 4. Hypothesis Testing and T-Tests
**Hypothesis testing helps in determining whether a sample data supports a given hypothesis. The T-test checks whether the means of two groups are statistically different.**

* **One-sample T-test: Tests if the sample mean is equal to a population mean.**

In [None]:
# One-sample t-test
t_stat, p_value = stats.ttest_1samp(data, popmean=30)  # Test if mean of data = 30

* **Two-sample T-test: Tests if the means of two independent groups are different.**

In [None]:
# Two-sample t-test
group1 = [23, 25, 31, 35, 28]
group2 = [22, 29, 32, 30, 27]
t_stat, p_value = stats.ttest_ind(group1, group2)


* **Paired T-test: Tests if the means of two related groups are different.**

In [None]:
# Paired t-test
before = [80, 75, 85, 90, 70]
after = [82, 78, 85, 95, 72]
t_stat, p_value = stats.ttest_rel(before, after)


# 5. Chi-Squared Test
**The Chi-Squared test assesses the association between categorical variables in a contingency table (observed vs. expected frequencies).**

In [None]:
from scipy.stats import chi2_contingency

# Contingency table (observed frequencies)
observed = [[10, 20, 30], [6, 9, 17]]

# Chi-Squared test
chi2, p_value, dof, expected = chi2_contingency(observed)


# 6. General Hypothesis Testing Framework
**In hypothesis testing:**

* **Null Hypothesis (H0): Assumes no effect or difference.**

* **Alternative Hypothesis (H1): Assumes there is an effect or difference.**

* **P-value: Helps in deciding whether to reject the null hypothesis.**

* **Reject H0 if p_value < alpha (usually 0.05).**

# 7. Critical Values for T-Test
**Critical values help determine the rejection region for the null hypothesis based on a given confidence level.**

In [None]:
# Critical t-value for a two-tailed test at 95% confidence and df=10
alpha = 0.05
critical_value = stats.t.ppf(1 - alpha/2, df=10)