# Q1. What are the three measures of central tendency?
Ans: \
1) Mean \
2) Median \
3) Mode

# Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?
Ans: \
###  **Mean (Average)**

- **What it is:**  
  The mean is the **sum of all values** divided by the **number of values**.
  
- **When to use:**  
  Use it when the data is **evenly distributed** (no major outliers).

- **Example:**  
  If students score 70, 75, 80, 85, and 90 in a test:  
  Mean = (70 + 75 + 80 + 85 + 90) / 5 = **80**

- **Weakness:**  
  Affected by **extreme values** (outliers).  
  If one student scored 0 instead of 70, the mean would drop significantly.

---

###  **Median**

- **What it is:**  
  The **middle value** when the data is arranged in order.

- **When to use:**  
  Great for **skewed data** or data with outliers.

- **Example:**  
  In the scores: 60, 65, 70, 75, 80 — the median is **70**.  
  If we had: 60, 65, 70, 75, 300 → Median still = **70**, not affected by 300.

- **Strength:**  
  More **robust** in the presence of outliers or skewed distributions.

---

###  **Mode**

- **What it is:**  
  The **most frequently occurring** value in a dataset.

- **When to use:**  
  Useful when analyzing **categorical data** or data with **repeating values**.

- **Example:**  
  In a list of shoe sizes: 8, 9, 9, 10, 10, 10 → Mode = **10**.

- **Special note:**  
  A dataset can have:
  - One mode (unimodal),
  - More than one mode (bimodal/multimodal),
  - Or no mode (if all values are unique).

---

###  Summary of Use:
- Use **mean** when you want the arithmetic average (and there are no extreme outliers).
- Use **median** when data is skewed or you want a more resistant measure.
- Use **mode** when you're interested in the **most common value**, especially for categorical or discrete data.


# Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5] \
Ans: \


```
import statistics

heights = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

# Mean
mean = statistics.mean(heights)

# Median
median = statistics.median(heights)

# Mode
mode = statistics.mode(heights)

print("Mean:", round(mean, 2))
print("Median:", median)
print("Mode:", mode)

```



# Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]\
Ans: \


```
height = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

import statistics

std_dev = statistics.stdev(heights)

print("Standard Deviation:", round(std_dev, 2))
```



# Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.
Ans: \

###  1. **Range**
- **Definition**: Difference between the **maximum and minimum** values.
- **Use**: Gives a quick sense of the total spread.
- **Limitation**: Affected by outliers.

 *Example*:  
For data `[5, 7, 8, 10, 50]`,  
**Range = 50 - 5 = 45**

---

###  2. **Variance**
- **Definition**: Average of the **squared differences** from the mean.
- **Use**: Shows how much the data deviates from the mean.
- **Units**: Square of the original unit, so not always directly interpretable.

 *Example*:  
For data `[2, 4, 6]`  
Mean = 4  
Squared differences: (2−4)² = 4, (4−4)² = 0, (6−4)² = 4  
**Variance = (4+0+4)/3 = 2.67**

---

###  3. **Standard Deviation**
- **Definition**: Square root of the variance.
- **Use**: Most commonly used measure of spread.
- **Units**: Same as original data — makes it easier to interpret.

 *Using previous variance*:  
**Standard Deviation = √2.67 ≈ 1.63*

---

###  Why This Matters:
- **Small dispersion** means data points are close to the mean.
- **Large dispersion** indicates more variability or spread.

# Q6. What is a Venn diagram?
Ans: \
A **Venn diagram** is a **visual tool** used to show the **relationships between different sets**. It uses **overlapping circles** to illustrate how sets intersect, differ, or share elements.

---

###  Key Features:
- Each **circle** represents a **set**.
- The **overlapping areas** show elements that are **common** between sets.
- The **non-overlapping parts** show elements **unique** to each set.

---

###  Why Use a Venn Diagram?
- To visualize **unions**, **intersections**, and **differences** between sets.
- Helps understand **logical relationships** like:
  - A ∪ B (Union)
  - A ∩ B (Intersection)
  - A − B (Difference)

---

###  Example:
Let’s say:
- Set A = {1, 2, 3}
- Set B = {3, 4, 5}

**Venn diagram would show**:
- Circle A: 1, 2, **3**
- Circle B: **3**, 4, 5  
→ 3 is in both sets, so it's in the **overlap**

# Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
(i) A◠B \
(ii) A ⋃ B \
Ans: \
(i) 2 or 6 \
(ii) 0, 2, 3, 4, 5, 6, 7, 8, 10

# Q8. What do you understand about skewness in data?
Ans: \
**Skewness** in data refers to the **asymmetry** or **lack of symmetry** in the distribution of data values.
###  Types of Skewness:

1. **Symmetrical Distribution (No Skew):**
   - The left and right sides of the distribution are mirror images.
   - Mean ≈ Median ≈ Mode

2. **Positively Skewed (Right Skewed):**
   - **Tail is longer on the right** side.
   - Most values are concentrated on the left.
   - **Mean > Median > Mode**

    *Example:* Income distribution — a few people earn much more than the average.

3. **Negatively Skewed (Left Skewed):**
   - **Tail is longer on the left** side.
   - Most values are concentrated on the right.
   - **Mean < Median < Mode**

    *Example:* Age at retirement — most people retire around a specific age, with a few retiring much earlier.

---

###  Why Skewness Matters:
- It helps identify the **distribution shape**.
- It affects the choice of **statistical methods** — some methods assume normal (symmetric) data.
- Can indicate **outliers** or **data issues**.


# Q9. If a data is right skewed then what will be the position of median with respect to mean?
Ans:
Mean > Median > Mode

# Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?
Ans: \
**covariance** and **correlation** are used to measure the **relationship between two variables**, but they are **not the same thing**. :

###  **Covariance**  
Covariance tells us **how two variables change together**.

- If **both variables increase** together, covariance is **positive**.
- If **one increases while the other decreases**, covariance is **negative**.
- If they are **unrelated**, covariance is **around zero**.

 **Formula:**  
\[
\text{Cov}(X, Y) = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n-1}
\]

 **Example:**  
If you’re comparing hours studied and exam scores, and both increase together, the covariance is positive.

 **Limitation:**  
Covariance is **scale-dependent**, meaning its value depends on the units of the variables — it can be hard to interpret.

---

###  **Correlation**  
Correlation is the **standardized version of covariance**. It measures both **strength and direction** of a linear relationship.

- Always ranges between **-1 and +1**
  - +1 = Perfect positive correlation
  - -1 = Perfect negative correlation
  - 0 = No correlation

 **Formula:**  
\[
\text{Correlation (r)} = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y}
\]

 **Example:**  
A correlation of **0.9** between height and weight means a strong positive linear relationship, regardless of units.

 **Advantage:**  
Easy to interpret and **unit-free**.

### **When to Use Covariance**
- You're interested in whether two variables move together.
- You **don’t need** a standardized scale.
- Example: In PCA (Principal Component Analysis), the **covariance matrix** is used to understand how features vary together.

---

### **When to Use Correlation**
- You want to understand **how strong** the relationship is.
- You need results that are **comparable across different units or datasets**.
- Example: Comparing the correlation between height & weight **vs.** age & income.


# Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.
Ans: \
###  Q11: What is the formula for calculating the **sample mean**?

The **sample mean** is the average of a set of data values taken from a sample of a population.

---

###  **Formula:**

\[
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
\]

Where:  
- \( \bar{x} \) = sample mean  
- \( x_i \) = each value in the sample  
- \( n \) = number of values in the sample  

---

###  **Example Calculation:**

Suppose we have a sample dataset:  
`data = [5, 10, 15, 20, 25]`

\[
\bar{x} = \frac{5 + 10 + 15 + 20 + 25}{5} = \frac{75}{5} = 15
\]

 **Sample mean is 15**

---

###  Python Code:

```python
data = [5, 10, 15, 20, 25]
sample_mean = sum(data) / len(data)
print("Sample Mean:", sample_mean)
```

# Q12. For a normal distribution data what is the relationship between its measure of central tendency?
Ans: \

In a **normal distribution** (also called a **Gaussian distribution**), the data is **symmetrically distributed** around the center. This has an important effect on the **measures of central tendency**:

---

### **Key Relationship:**

\[
\textbf{Mean} = \textbf{Median} = \textbf{Mode}
\]

---

### Explanation:

- **Mean**: The average of all data points.
- **Median**: The middle value when data is sorted.
- **Mode**: The most frequently occurring value.

In a perfectly normal distribution:
- The curve is bell-shaped and symmetric.
- All three values lie at the **center** of the distribution.
- There's **no skewness** (neither left nor right).

---

### Visual Insight:
If you were to plot a histogram of a normal distribution:
- The peak of the curve (mode)
- The balancing point (mean)
- The midpoint (median)

All land on the **same vertical line**.


# Q13. How is covariance different from correlation?
Ans: \
 ## * Covariance: Measures how two variables change together.

Range: Can be any real number (positive, negative, or zero).

Scale-dependent: Units of covariance depend on the units of the variables.

Interpretation:

Positive → Variables increase together.

Negative → One increases while the other decreases.

Zero → No linear relationship.

## * Correlation: Measures both the strength and direction of the linear relationship between two variables.

Range: Always between -1 and +1.

+1 → Perfect positive correlation

-1 → Perfect negative correlation

0 → No linear correlation

Scale-independent: It’s standardized, so you can compare across datasets.

# Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.
Ans: \

Outliers — values that are **significantly higher or lower** than the rest of the data — can **distort** statistical summaries. Here's how they affect both **central tendency** and **dispersion**:

---

###  1. **Central Tendency**:

- **Mean**:  **Highly affected**  
  Because the mean is calculated by summing all values, a single extreme value can **pull** the average up or down.

- **Median**:  **Less affected**  
  Median is the middle value, so it remains **stable** even if outliers are present (unless the dataset is very small).

- **Mode**:  **Not affected**  
  Mode is based on frequency, so it's unaffected by extreme values.

---

###  2. **Dispersion**:

- **Range**:  **Highly affected**  
  Since range is max − min, one outlier can dramatically **inflate** it.

- **Variance / Standard Deviation**:  **Sensitive to outliers**  
  These are based on squared differences from the mean, so outliers create **larger deviations**, increasing both.

---

###  Example:

Let’s take two small datasets:

#### Dataset A (No Outliers):  
`[10, 12, 13, 14, 15]`  
- Mean = 12.8  
- Median = 13  
- Range = 5  
- Std Dev ≈ 1.72

#### Dataset B (With Outlier):  
`[10, 12, 13, 14, 100]`  
- Mean = **29.8** ⬅ Drastically increased  
- Median = 13 ⬅ **Stable**  
- Range = **90** ⬅ Big jump  
- Std Dev ≈ **38.57** ⬅ Huge increase

---

###  Conclusion:

Outliers can **skew** the mean and **inflate** measures of variability. That’s why for datasets with potential outliers, **median** and **IQR (interquartile range)** are often better choices for summary