### Question 1: What is the difference between descriptive statistics and inferential statistics? Explain with examples.

**Answer:**
Statistics is divided into two branches: **Descriptive Statistics** and **Inferential Statistics**.

**1. Descriptive Statistics**
- Summarize and describe dataset features.
- Tools: mean, median, mode, standard deviation, tables, graphs.

*Example:* Teacher calculates the average marks of 50 students. Mean = 65.

**2. Inferential Statistics**
- Draw conclusions about population using sample data.
- Tools: probability, hypothesis testing, regression.

*Example:* Researcher samples 100 students from 1000 to estimate school average marks as ~70.

### Question 2: What is sampling in statistics? Explain random vs stratified sampling.

**Answer:**
- **Sampling**: Selecting a subset of population for analysis.

**Random Sampling**:
- Every member has equal chance.
- Example: Randomly choose 100 students from 1000.

**Stratified Sampling**:
- Divide population into strata, sample proportionally.
- Example: From 60% boys & 40% girls, select 60 boys & 40 girls.

### Question 3: Define mean, median, mode. Importance of these measures.

**Answer:**
- **Mean**: Average of all values.
- **Median**: Middle value when data sorted.
- **Mode**: Most frequent value.

**Importance**:
- Mean: overall average.
- Median: robust against outliers.
- Mode: useful for categorical data.

### Question 4: Explain skewness and kurtosis. What does a positive skew imply?

**Answer:**
- **Skewness**: Asymmetry of distribution.
  - Positive skew: long right tail (Mean > Median > Mode).
  - Negative skew: long left tail.

- **Kurtosis**: Peakedness of distribution.
  - Mesokurtic: normal (~3).
  - Leptokurtic: more peaked (>3).
  - Platykurtic: flatter (<3).

**Positive skew implication**: Most values low, few extreme high values pull mean right. Example: income distribution.

### Question 5: Python program to compute mean, median, mode

In [None]:
import statistics

# Given list
numbers = [12, 15, 12, 18, 19, 12, 20, 22, 19, 19, 24, 24, 24, 26, 28]

# Compute mean, median, mode
mean_value = statistics.mean(numbers)
median_value = statistics.median(numbers)
mode_value = statistics.mode(numbers)

print("Numbers:", numbers)
print("Mean:", mean_value)
print("Median:", median_value)
print("Mode:", mode_value)

### Question 6: Compute covariance and correlation coefficient between two datasets

In [None]:
import numpy as np

list_x = [10, 20, 30, 40, 50]
list_y = [15, 25, 35, 45, 60]

x = np.array(list_x)
y = np.array(list_y)

# Covariance
cov_matrix = np.cov(x, y, bias=True)
cov_xy = cov_matrix[0][1]

# Correlation
corr_xy = np.corrcoef(x, y)[0][1]

print("List X:", list_x)
print("List Y:", list_y)
print("Covariance:", cov_xy)
print("Correlation Coefficient:", corr_xy)

### Question 7: Python script to draw a boxplot and identify outliers

In [None]:
import matplotlib.pyplot as plt

data = [12, 14, 14, 15, 18, 19, 19, 21, 22, 22, 23, 23, 24, 26, 29, 35]

plt.boxplot(data, vert=True, patch_artist=True)
plt.title("Boxplot of Data")
plt.ylabel("Values")
plt.show()

# Outlier detection using IQR
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

outliers = [x for x in data if x < lower_bound or x > upper_bound]

print("Q1:", Q1, "Q3:", Q3, "IQR:", IQR)
print("Lower Bound:", lower_bound, "Upper Bound:", upper_bound)
print("Outliers:", outliers)

### Question 8: Relationship between advertising spend and daily sales using covariance & correlation

In [None]:
advertising_spend = [200, 250, 300, 400, 500]
daily_sales = [2200, 2450, 2750, 3200, 4000]

x = np.array(advertising_spend)
y = np.array(daily_sales)

cov_xy = np.cov(x, y, bias=True)[0][1]
corr_xy = np.corrcoef(x, y)[0][1]

print("Advertising Spend:", advertising_spend)
print("Daily Sales:", daily_sales)
print("Covariance:", cov_xy)
print("Correlation Coefficient:", corr_xy)

### Question 9: Survey scores distribution - summary stats & histogram

In [None]:
survey_scores = [7, 8, 5, 9, 6, 7, 8, 9, 10, 4, 7, 6, 9, 8, 7]

mean_score = statistics.mean(survey_scores)
median_score = statistics.median(survey_scores)
mode_score = statistics.mode(survey_scores)
std_dev = statistics.stdev(survey_scores)

print("Mean:", mean_score)
print("Median:", median_score)
print("Mode:", mode_score)
print("Standard Deviation:", std_dev)

plt.hist(survey_scores, bins=7, edgecolor='black', color='skyblue')
plt.title("Histogram of Customer Satisfaction Scores")
plt.xlabel("Survey Scores")
plt.ylabel("Frequency")
plt.xticks(range(1, 11))
plt.show()