## Q1. What are the three measures of central tendency?

The three measures of central tendency are statistical measures that represent the central or average value of a dataset. They provide a single, representative value around which the data tend to cluster. The three main measures of central tendency are:

1. **Mean (Arithmetic Mean):**
   - The mean is calculated by summing up all the values in a dataset and dividing the sum by the total number of observations. It is often denoted by the symbol \(\bar{x}\) for a sample and \(\mu\) for a population.

   \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

   where \(n\) is the number of observations, and \(x_i\) represents each individual data point.

2. **Median:**
   - The median is the middle value of a dataset when it is arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle values. The median is not affected by extreme values, making it a robust measure of central tendency.

   - For an odd number of observations:
     \[ \text{Median} = \text{Middle value} \]

   - For an even number of observations:
     \[ \text{Median} = \frac{\text{Sum of two middle values}}{2} \]

3. **Mode:**
   - The mode is the value or values that appear most frequently in a dataset. A dataset may have one mode (unimodal), more than one mode (multimodal), or no mode if all values occur with the same frequency.

   - For a unimodal dataset, the mode is the peak or peaks in the distribution.

These measures provide different insights into the central tendency of a dataset and are selected based on the characteristics of the data distribution. The mean is sensitive to extreme values, the median is resistant to outliers, and the mode identifies the most common value(s). Each measure is appropriate in different situations depending on the nature of the data.

## Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?

The mean, median, and mode are measures of central tendency used to describe the central or average value of a dataset. While they all provide a sense of where the "center" of the data lies, they do so in different ways and are sensitive to different aspects of the dataset.

### Mean:
- **Calculation:** The mean is calculated by summing up all the values in the dataset and then dividing by the total number of observations.
  \[ \text{Mean} = \frac{\sum_{i=1}^{n} x_i}{n} \]
- **Sensitivity:** The mean is sensitive to extreme values (outliers) in the dataset. A few extremely high or low values can significantly impact the mean.

### Median:
- **Calculation:** The median is the middle value of a dataset when it is arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle values.
- **Sensitivity:** The median is less sensitive to extreme values than the mean. It represents the central value that divides the dataset into two equal halves.

### Mode:
- **Calculation:** The mode is the value or values that appear most frequently in a dataset.
- **Sensitivity:** The mode is not affected by extreme values. It is particularly useful for identifying the most common values in a dataset.
- **Multimodal Distributions:** A dataset may have one mode (unimodal), more than one mode (multimodal), or no mode at all if all values occur with the same frequency.

### Use in Measuring Central Tendency:

1. **Balancing Outliers:**
   - If a dataset has extreme values or outliers, the median may provide a better representation of the central tendency than the mean. The median is less influenced by extreme values.

2. **Identifying Most Common Values:**
   - The mode is useful for identifying the most common values in a dataset, especially in categorical data. It can help highlight peaks in the distribution.

3. **Symmetry and Skewness:**
   - In a symmetrical distribution, the mean, median, and mode are typically close to each other. In a skewed distribution (positively or negatively skewed), they may differ.

4. **Choice in Different Situations:**
   - The choice of which measure to use depends on the characteristics of the data and the goal of the analysis. For example, the mean is often used in situations where the data are approximately normally distributed.

5. **Robustness:**
   - The median is a more robust measure in the presence of outliers, making it a preferred choice when the dataset may be influenced by extreme values.

In summary, the mean, median, and mode are valuable measures for summarizing the central tendency of a dataset. The choice of which measure to use depends on the nature of the data and the specific goals of the analysis. It's common to consider all three measures to gain a comprehensive understanding of the dataset's central tendency.

## Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

Let's calculate the three measures of central tendency (mean, median, and mode) for the given height data:

\[ [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5] \]

### Mean Calculation:
\[ \text{Mean} = \frac{\sum_{i=1}^{n} x_i}{n} \]

\[ \text{Mean} = \frac{178 + 177 + 176 + 177 + 178.2 + 178 + 175 + 179 + 180 + 175 + 178.9 + 176.2 + 177 + 172.5 + 178 + 176.5}{16} \]

\[ \text{Mean} = \frac{2836.3}{16} \]

\[ \text{Mean} \approx 177.27 \]

### Median Calculation:
- Arrange the data in ascending order:
  \[ 172.5, 175, 175, 176, 176.2, 176.5, 177, 177, 177, 178, 178, 178, 178.2, 178.9, 179, 180 \]

- Since there is an even number of observations (16), the median is the average of the two middle values:
  \[ \text{Median} = \frac{177 + 178}{2} \]

\[ \text{Median} = \frac{355}{2} \]

\[ \text{Median} = 177.5 \]

### Mode Calculation:
- The mode is the value(s) that appear most frequently. In this case, all values appear only once, so the dataset is considered to have no mode.

### Summary:
- Mean: \( \approx 177.27 \)
- Median: \( 177.5 \)
- Mode: No mode

These calculations provide a summary of the central tendency of the given height data. The mean represents the average height, the median represents the middle height value, and in this specific dataset, there is no mode as all values are unique.

## Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

The standard deviation (\( \sigma \)) measures the amount of variation or dispersion in a set of values. It is calculated using the formula:

\[ \sigma = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n}} \]

where:
- \( x_i \) is each individual data point,
- \( \bar{x} \) is the mean of the data, and
- \( n \) is the total number of observations.

Let's calculate the standard deviation for the given height data:

\[ [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5] \]

1. Calculate the mean (\( \bar{x} \)):
\[ \bar{x} = \frac{\sum_{i=1}^{16} x_i}{16} \]

2. Calculate the squared differences from the mean:
\[ (x_i - \bar{x})^2 \]

3. Sum up the squared differences.

4. Divide by the total number of observations (\( n = 16 \)).

5. Take the square root of the result to get the standard deviation.

Let's perform the calculations:

\[ \bar{x} = \frac{2836.3}{16} \approx 177.27 \]

\[ (x_i - \bar{x})^2 = [0.73^2, -0.27^2, -1.27^2, -0.27^2, 0.93^2, 0.73^2, -2.27^2, 1.73^2, 2.73^2, -2.27^2, 1.63^2, -1.07^2, -0.27^2, -4.77^2, 0.73^2, -0.77^2] \]

Sum of squared differences = \( 53.78 \)

\[ \sigma = \sqrt{\frac{53.78}{16}} \approx \sqrt{3.36} \approx 1.83 \]

Therefore, the standard deviation for the given height data is approximately \(1.83\).

## Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

Measures of dispersion, including range, variance, and standard deviation, provide information about the spread or variability of a dataset. They quantify how much individual data points deviate from the central tendency measures (mean, median, mode). Here's how these measures are used to describe the spread of a dataset, along with an example:

### 1. Range:

- **Definition:** The range is the difference between the maximum and minimum values in a dataset.

- **Use:** It provides a simple measure of the overall spread of the data. A larger range indicates greater variability.

- **Example:**
  - For a dataset [10, 15, 20, 25, 30], the range is \(30 - 10 = 20\).

### 2. Variance:

- **Definition:** The variance is the average of the squared differences between each data point and the mean of the dataset.

- **Use:** It measures the average degree of dispersion. A higher variance indicates more spread in the data.

- **Example:**
  - For a dataset [5, 8, 12, 15, 18], the mean is \(11.6\).
  - Squared differences: \((5-11.6)^2, (8-11.6)^2, (12-11.6)^2, (15-11.6)^2, (18-11.6)^2\)
  - Variance = \(\frac{1}{5} \sum (\text{squared differences}) \)

### 3. Standard Deviation:

- **Definition:** The standard deviation is the square root of the variance. It provides a measure of dispersion in the same units as the data.

- **Use:** It is widely used due to its interpretability and is sensitive to outliers.

- **Example:**
  - Continuing from the variance example, the standard deviation is the square root of the variance.

### Example:

Consider two datasets:

\[ A: [5, 10, 15, 20, 25] \]
\[ B: [8, 9, 10, 11, 12] \]

Both datasets have the same mean (15) and the same range (20), but they differ in terms of variance and standard deviation. Let's calculate the variance and standard deviation for both datasets:

#### Dataset A:
- Variance: \(\frac{1}{5} \sum (x_i - \bar{x})^2\)
- Standard Deviation: \(\sqrt{\text{Variance}}\)

#### Dataset B:
- Variance: \(\frac{1}{5} \sum (x_i - \bar{x})^2\)
- Standard Deviation: \(\sqrt{\text{Variance}}\)

By comparing the variance and standard deviation of both datasets, we can better understand the spread of the data. A lower variance and standard deviation indicate less variability in the data, while higher values suggest greater variability.

## Q6. What is a Venn diagram?

A Venn diagram is a graphical representation used to illustrate the relationships between sets. It consists of overlapping circles or ellipses, each representing a set, and the overlapping regions represent the elements that belong to multiple sets. Venn diagrams are named after the English logician and philosopher John Venn, who introduced them in the late 19th century.

Key features of a Venn diagram:

1. **Sets and Regions:**
   - Each circle in a Venn diagram represents a set. The elements of the set are contained within the circle.
   - The overlapping regions between circles represent the elements that belong to multiple sets.

2. **Overlap:**
   - The extent of overlap indicates the degree of commonality between sets. A larger overlap implies more shared elements.

3. **Disjoint Sets:**
   - If sets have no elements in common, the circles do not overlap, and the diagram shows distinct, non-overlapping regions.

4. **Universal Set:**
   - A rectangle or other shape enclosing all the circles may represent the universal set, which includes all elements under consideration.

5. **Complements:**
   - The regions outside the circles but within the universal set represent the complement of the sets.

### Example Venn Diagram:

Consider three sets: A, B, and C. Here's an example Venn diagram:

```
      A                 B
   ______           ______
  |      |         |      |
  |  1   |         |  2   |
  |______|         |______|
  
         \_________/
            C
```

- Circle A represents set A with element 1.
- Circle B represents set B with element 2.
- The overlapping region represents the elements that belong to both A and B (intersection).
- Circle C represents set C, which may contain elements not shared with A or B.
- The entire rectangle represents the universal set.

Venn diagrams are commonly used in various fields, including mathematics, logic, statistics, and problem-solving, to visually represent relationships between different sets and their elements. They are a helpful tool for illustrating concepts such as intersections, unions, and complements of sets.

## Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
(i) A B
(ii) A ⋃ B

Let's perform the set operations for the given sets:

\[ A = \{2, 3, 4, 5, 6, 7\} \]
\[ B = \{0, 2, 6, 8, 10\} \]

### (i) Intersection (A ∩ B):

The intersection of sets A and B (\(A \cap B\)) consists of elements that are common to both sets.

\[ A \cap B = \{x \mid x \in A \text{ and } x \in B\} \]

\[ A \cap B = \{2, 6\} \]

### (ii) Union (A ∪ B):

The union of sets A and B (\(A \cup B\)) consists of all unique elements from both sets.

\[ A \cup B = \{x \mid x \in A \text{ or } x \in B\} \]

\[ A \cup B = \{0, 2, 3, 4, 5, 6, 7, 8, 10\} \]

In summary:
\[ A \cap B = \{2, 6\} \]
\[ A \cup B = \{0, 2, 3, 4, 5, 6, 7, 8, 10\} \]

These results represent the intersection and union of sets A and B, respectively.

## Q8. What do you understand about skewness in data?

Skewness is a statistical measure that describes the asymmetry or lack of symmetry in a distribution of data. In a perfectly symmetrical distribution, the right and left sides of the distribution are mirror images of each other. However, in skewed distributions, the tails on one side are longer or fatter than the other, and the distribution is not symmetric.

There are three main types of skewness:

1. **Positive Skewness (Right Skewness):**
   - In a positively skewed distribution, the right tail (larger values) is longer or fatter than the left tail (smaller values).
   - The mean is typically greater than the median.
   - The distribution is sometimes called "right-skewed" or "positively skewed."

2. **Negative Skewness (Left Skewness):**
   - In a negatively skewed distribution, the left tail (smaller values) is longer or fatter than the right tail (larger values).
   - The mean is typically less than the median.
   - The distribution is sometimes called "left-skewed" or "negatively skewed."

3. **Zero Skewness:**
   - In a perfectly symmetrical distribution, the tails on both sides are of equal length, and the distribution has zero skewness.
   - The mean is equal to the median in a symmetrical distribution.

### Skewness Formula:

The skewness (S) can be calculated using the following formula:

\[ S = \frac{n}{(n-1)(n-2)} \sum \left( \frac{X_i - \bar{X}}{s} \right)^3 \]

where:
- \( n \) is the number of observations.
- \( X_i \) is each individual data point.
- \( \bar{X} \) is the mean of the data.
- \( s \) is the standard deviation of the data.

### Interpretation:

- **Positive Skewness:**
  - If skewness is positive, it indicates that the distribution has a tail on the right side, and the mean is greater than the median.

- **Negative Skewness:**
  - If skewness is negative, it indicates that the distribution has a tail on the left side, and the mean is less than the median.

- **Zero Skewness:**
  - A skewness of zero suggests a perfectly symmetrical distribution.

### Practical Significance:

- Skewness is important in understanding the shape of a distribution and its departure from symmetry.
- It helps in identifying the direction and degree of asymmetry in datasets.
- Skewness is useful in finance, economics, and other fields where understanding the distribution of data is critical.

In summary, skewness provides a quantitative measure of the asymmetry in a distribution, helping analysts and researchers characterize the shape and tendencies of datasets.

## Q9. If a data is right skewed then what will be the position of median with respect to mean?

In a right-skewed distribution, the tail on the right side (larger values) is longer or fatter than the left side (smaller values). This implies that there are relatively few extreme values on the right side, leading to a higher concentration of values on the left side.

In terms of the position of the median (\(M\)) with respect to the mean (\(\bar{X}\)) in a right-skewed distribution:

1. **Right-Skewed Distribution:**
   - The mean (\(\bar{X}\)) is influenced by the presence of the longer right tail, as extreme values have a greater impact on the mean.
   - The median (\(M\)) is less affected by extreme values, as it represents the middle value when the data are arranged in ascending or descending order.
   - In a right-skewed distribution, the median is typically less than the mean.

Mathematically, if \(M < \bar{X}\), it indicates a right-skewed distribution.

This relationship makes sense intuitively. In a right-skewed distribution, the mean is "pulled" toward the higher values in the longer right tail, resulting in a mean that is greater than the median. The median, being a measure of the central tendency less influenced by extreme values, tends to be positioned to the left of the mean in a right-skewed distribution.

In summary, in a right-skewed distribution, the median is typically to the left of the mean.

## Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?

**Covariance:**

- **Definition:** Covariance measures the degree to which two variables change together. It indicates whether an increase in one variable corresponds to an increase or decrease in another.
  
- **Formula:**
  \[ \text{Cov}(X, Y) = \frac{\sum_{i=1}^{n} (x_i - \bar{X})(y_i - \bar{Y})}{n} \]
  where:
    - \( \text{Cov}(X, Y) \) is the covariance between variables \(X\) and \(Y\),
    - \( x_i \) and \( y_i \) are individual data points,
    - \( \bar{X} \) and \( \bar{Y} \) are the means of variables \(X\) and \(Y\), respectively,
    - \( n \) is the number of data points.

- **Interpretation:**
  - A positive covariance indicates a positive relationship (as one variable increases, the other tends to increase).
  - A negative covariance indicates a negative relationship (as one variable increases, the other tends to decrease).
  - The magnitude of covariance is not standardized, making it challenging to compare covariances between different pairs of variables.

**Correlation:**

- **Definition:** Correlation is a standardized measure of the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where -1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear relationship.

- **Formula:**
  \[ \text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y} \]
  where:
    - \( \text{Corr}(X, Y) \) is the correlation coefficient between variables \(X\) and \(Y\),
    - \( \text{Cov}(X, Y) \) is the covariance between variables \(X\) and \(Y\),
    - \( \sigma_X \) and \( \sigma_Y \) are the standard deviations of variables \(X\) and \(Y\), respectively.

- **Interpretation:**
  - \( \text{Corr}(X, Y) \) closer to 1 indicates a strong positive linear relationship.
  - \( \text{Corr}(X, Y) \) closer to -1 indicates a strong negative linear relationship.
  - \( \text{Corr}(X, Y) \) around 0 indicates a weak or no linear relationship.

**Differences:**

1. **Standardization:**
   - Covariance is not standardized, and its magnitude depends on the scales of the variables.
   - Correlation is standardized, making it a unitless measure that allows for easier comparison between different pairs of variables.

2. **Scale:**
   - Covariance can take any value, positive or negative, depending on the direction of the relationship.
   - Correlation always ranges from -1 to 1, providing a clear indication of the strength and direction of the linear relationship.

**Uses in Statistical Analysis:**

- **Covariance:**
  - Covariance is used to understand the direction of the relationship between two variables.
  - It is a crucial component in calculating correlation.

- **Correlation:**
  - Correlation is widely used in statistics and data analysis.
  - It helps assess the strength and direction of the linear relationship between two variables.
  - The correlation coefficient is used in regression analysis to estimate coefficients and make predictions.
  - Correlation is also helpful in identifying multicollinearity in multiple regression analysis.

In summary, while covariance and correlation both quantify relationships between variables, correlation provides a standardized measure that is easier to interpret and compare across different datasets. Correlation is particularly useful when comparing the strength and direction of relationships in different contexts or when dealing with variables on different scales.

## Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

The sample mean, denoted by \( \bar{x} \), is a measure of central tendency that represents the average value of a set of data points in a sample. The formula for calculating the sample mean is:

\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

where:
- \( \bar{x} \) is the sample mean,
- \( x_i \) is each individual data point in the sample,
- \( \sum \) denotes the sum, and
- \( n \) is the number of data points in the sample.

### Example Calculation:

Let's calculate the sample mean for a dataset:

\[ \text{Dataset: } [12, 15, 18, 22, 25] \]

\[ \bar{x} = \frac{12 + 15 + 18 + 22 + 25}{5} \]

\[ \bar{x} = \frac{92}{5} \]

\[ \bar{x} = 18.4 \]

Therefore, the sample mean for the given dataset is \( \bar{x} = 18.4 \). This means that, on average, the values in the sample are centered around 18.4. The sample mean is a representative measure of central tendency for the dataset.

## Q12. For a normal distribution data what is the relationship between its measure of central tendency?

For a normal distribution, the relationship between its measures of central tendency (mean, median, and mode) is quite specific and follows the principle of symmetry inherent in a normal distribution.

1. **Mean (\(\mu\)):**
   - In a normal distribution, the mean (\(\mu\)) is located at the center of the distribution.
   - The mean is the point of balance for the distribution, and it is equal to the median.
   - Therefore, in a normal distribution, the mean is exactly at the center of the distribution.

2. **Median:**
   - The median in a normal distribution is also located at the center.
   - Since a normal distribution is symmetric, the median is equal to the mean.
   - The median divides the distribution into two equal halves.

3. **Mode:**
   - In a normal distribution, the mode is also located at the center.
   - A normal distribution is unimodal, meaning it has only one peak.
   - The mode, mean, and median are all at the same point in a perfectly symmetric normal distribution.

In summary, for a normal distribution:

\[ \text{Mean (\(\mu\))} = \text{Median} = \text{Mode} \]

This relationship holds true for an idealized normal distribution. However, in real-world scenarios, data may deviate slightly from perfect normality, and small discrepancies between the mean, median, and mode can occur. Nonetheless, the central tendency measures are generally very close in a normal distribution, and any differences are minor.

## Q13. How is covariance different from correlation?

Covariance and correlation are both measures that describe the relationship between two variables, but they differ in terms of their scale, interpretation, and the extent to which they are affected by the scales of the variables.

**Covariance:**

- **Scale:**
  - Covariance is not standardized and can take any value, positive or negative. The magnitude of covariance depends on the scales of the variables involved.

- **Formula:**
  \[ \text{Cov}(X, Y) = \frac{\sum_{i=1}^{n} (x_i - \bar{X})(y_i - \bar{Y})}{n} \]
  where:
    - \( \text{Cov}(X, Y) \) is the covariance between variables \(X\) and \(Y\),
    - \( x_i \) and \( y_i \) are individual data points,
    - \( \bar{X} \) and \( \bar{Y} \) are the means of variables \(X\) and \(Y\), respectively,
    - \( n \) is the number of data points.

- **Interpretation:**
  - Positive covariance indicates a positive relationship (as one variable increases, the other tends to increase).
  - Negative covariance indicates a negative relationship (as one variable increases, the other tends to decrease).

**Correlation:**

- **Scale:**
  - Correlation is a standardized measure that always ranges between -1 and 1, making it unitless. It is not affected by the scales of the variables.

- **Formula:**
  \[ \text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y} \]
  where:
    - \( \text{Corr}(X, Y) \) is the correlation coefficient between variables \(X\) and \(Y\),
    - \( \text{Cov}(X, Y) \) is the covariance between variables \(X\) and \(Y\),
    - \( \sigma_X \) and \( \sigma_Y \) are the standard deviations of variables \(X\) and \(Y\), respectively.

- **Interpretation:**
  - \( \text{Corr}(X, Y) \) closer to 1 indicates a strong positive linear relationship.
  - \( \text{Corr}(X, Y) \) closer to -1 indicates a strong negative linear relationship.
  - \( \text{Corr}(X, Y) \) around 0 indicates a weak or no linear relationship.

**Key Differences:**

1. **Standardization:**
   - Covariance is not standardized and depends on the scales of the variables.
   - Correlation is standardized, allowing for easy comparison between different pairs of variables.

2. **Scale:**
   - Covariance can take any value, positive or negative.
   - Correlation always ranges between -1 and 1.

3. **Interpretation:**
   - Covariance's magnitude is not easily interpretable and does not provide a clear indication of the strength or direction of the relationship.
   - Correlation provides a standardized measure with clear interpretation: the strength and direction of the linear relationship.

In summary, while both covariance and correlation measure the relationship between two variables, correlation is preferred in many cases due to its standardized scale, making it more interpretable and allowing for meaningful comparisons between different pairs of variables.

## Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

Outliers, which are extreme values in a dataset, can significantly impact measures of central tendency and dispersion. The effect of outliers depends on the nature and extent of their deviation from the rest of the data. Here's how outliers can affect these measures:

### 1. Measures of Central Tendency:

#### Mean:
- **Effect:** Outliers can heavily influence the mean, pulling it in the direction of the extreme values.
- **Example:**
  - Dataset: [10, 15, 20, 25, 100]
  - Mean without outlier: \( \frac{10 + 15 + 20 + 25}{4} = 17.5 \)
  - Mean with outlier: \( \frac{10 + 15 + 20 + 25 + 100}{5} = 34 \)

#### Median:
- **Effect:** The median is less affected by outliers since it is not influenced by extreme values. It represents the middle value when the data are ordered.
- **Example:**
  - Dataset: [10, 15, 20, 25, 100]
  - Median without outlier: 20
  - Median with outlier: 20

#### Mode:
- **Effect:** Outliers do not affect the mode since the mode is the most frequently occurring value.
- **Example:**
  - Dataset: [10, 15, 20, 25, 100]
  - Mode without outlier: No mode
  - Mode with outlier: No mode

### 2. Measures of Dispersion:

#### Range:
- **Effect:** Outliers can significantly impact the range, especially if they are extreme values.
- **Example:**
  - Dataset: [10, 15, 20, 25, 100]
  - Range without outlier: \(100 - 10 = 90\)
  - Range with outlier: \(100 - 10 = 90\)

#### Variance and Standard Deviation:
- **Effect:** Outliers increase the spread of the data, leading to higher variance and standard deviation.
- **Example:**
  - Dataset: [10, 15, 20, 25, 100]
  - Variance without outlier: \(Var = \frac{1}{4} \sum (x_i - \bar{x})^2 \)
  - Variance with outlier: \(Var = \frac{1}{5} \sum (x_i - \bar{x})^2 \)

In summary, outliers can distort the interpretation of central tendency and dispersion measures, particularly the mean and measures influenced by extreme values. It is essential to be aware of the presence of outliers and consider their impact on statistical analysis. Robust measures, such as the median and interquartile range, may be preferred in the presence of outliers to provide a more accurate representation of the central tendency and spread of the data.