# Descriptive statistics
Descriptive statistics is a branch of statistics that involves the collection, organization, analysis, interpretation, and presentation of data to provide a summary or description of the main features of a dataset. It is primarily focused on revealing patterns, trends, and key characteristics of the data, making it easier to understand and draw meaningful conclusions from the information at hand.

Descriptive statistics encompasses various statistical measures and techniques, including:

1. **Measures of Central Tendency:** These measures indicate where the center of the data is and include the mean (average), median (middle value), and mode (most frequently occurring value).

2. **Measures of Dispersion:** These measures describe the spread or variability of the data and include variance and standard deviation. They help assess how data points differ from the central tendency.

3. **Data Visualization:** Graphical representations like histograms, box plots, and scatterplots are used to visually represent data, making it easier to spot patterns and trends.

4. **Percentiles and Quartiles:** These divide data into percentile groups, helping to understand the relative position of data points within the dataset.


## Measures of Central Tendency

**Mean (Average):**
- **Definition**: The mean is the sum of all values divided by the total number of values. It represents the central value of a dataset.

- **Applications**:
  - Finance: Calculating the average return on investments.
  - Education: Determining class test scores' average performance.
  - Market Research: Understanding the average customer age.
  - Healthcare: Analyzing the average weight gain in a clinical trial.
  - Sports: Calculating the average score in a tournament.

**Median (Middle Value):**
- **Definition**: The median is the middle value in a dataset when values are arranged in ascending order. If there's an even number of values, it's the average of the two middle values.

- **Applications**:
  - Healthcare: Identifying the middle patient's age in a dataset.
  - Real Estate: Finding the median home price in a neighborhood.
  - Education: Determining the median score on a standardized test.
  - E-commerce: Analyzing the median order value for an online store.
  - Survey Analysis: Finding the median income in a population.

**Mode (Most Frequently Occurring Value):**
- **Definition**: The mode is the value that appears most frequently in a dataset.

- **Applications**:
  - Healthcare: Identifying the most common blood type in a patient group.
  - Customer Service: Determining the most common customer complaint.
  - Retail: Identifying the most popular product in a store.
  - Manufacturing: Finding the most common defect in a production run.
  - Social Sciences: Analyzing the most common response in a survey.

Now, let's provide five examples for each measure with step-by-step solutions:

**Mean Examples with Solutions:**

1. **Example 1: Calculating Test Scores Mean**
   - Dataset: 85, 92, 78, 96, 88.
   - Step 1: Add all the values: 85 + 92 + 78 + 96 + 88 = 439.
   - Step 2: Count the total number of values, which is 5.
   - Step 3: Divide the sum (439) by the total count (5): Mean = 439 / 5 = 87.8.

2. **Example 2: Finding Mean Income**
   - Dataset:  ₦40,000, ₦45,000, ₦38,000, ₦52,000, ₦48,000.
   - Step 1: Add all the income values: ₦40,000 + ₦45,000 + ₦38,000 + ₦52,000 + ₦48,000 = ₦223,000.
   - Step 2: Count the total number of incomes, which is 5.
   - Step 3: Divide the sum (₦223,000) by the total count (5): Mean = ₦223,000 / 5 = ₦44,600.

3. **Example 3: Mean Temperature for a Week**
   - Dataset: 70°F, 75°F, 68°F, 72°F, 73°F, 69°F, 74°F.
   - Step 1: Add all the temperatures: 70 + 75 + 68 + 72 + 73 + 69 + 74 = 501.
   - Step 2: Count the total number of temperatures, which is 7.
   - Step 3: Divide the sum (501) by the total count (7): Mean = 501 / 7 ≈ 71.57°F.

4. **Example 4: Mean Age of Participants**
   - Dataset: 32 years, 45 years, 28 years, 36 years, 40 years.
   - Step 1: Add all the ages: 32 + 45 + 28 + 36 + 40 = 181.
   - Step 2: Count the total number of ages, which is 5.
   - Step 3: Divide the sum (181) by the total count (5): Mean = 181 / 5 = 36.2 years.

5. **Example 5: Mean Weight of Products**
   - Dataset: 4 kg, 5 kg, 4.5 kg, 3.5 kg, 5.2 kg.
   - Step 1: Add all the weights: 4 + 5 + 4.5 + 3.5 + 5.2 = 22.2 kg.
   - Step 2: Count the total number of products, which is 5.
   - Step 3: Divide the sum (22.2 kg) by the total count (5): Mean = 22.2 / 5 = 4.44 kg.

**Median Examples with Solutions:**

1. **Example 1: Finding Median Age**
   - Dataset: 25, 32, 19, 42, 38.
   - Step 1: Arrange the ages in ascending order: 19, 25, 32, 38, 42.
   - Step 2: Since there's an odd count of ages (5), the median is the middle value, which is 32 years.

2. **Example 2: Median Test Scores**
   - Dataset: 85, 92, 78, 96, 88.
   - Step 1: Arrange the scores in ascending order: 78, 85, 88, 92, 96.
   - Step 2: Since there's an odd count of scores (5), the median is the middle value, which is 88.

3. **Example 3: Median Household Incomes**
   - Dataset: ₦48,000, ₦38,000, ₦60,000, ₦42,000.
   - Step 1: Arrange the incomes in ascending order: ₦38,000, ₦42,000, ₦48,000, ₦60,000.
   - Step 2: Since there's an even count of incomes (4), the median is the average of the two middle values: Median = (₦42,000 + ₦48,000) / 2 = ₦45,000.

4. **Example 4: Median Temperature in a Week**

    - Dataset: 70°F, 75°F, 68°F, 72°F.
    - Step 1: Arrange the temperatures in ascending order: 68°F, 70°F, 72°F, 75°F.
    - Step 2: Since there's an odd count of temperatures (4), the median is the two middle values averaged, which is (70°F + 72°F) / 2 = 71°F.


5. **Example 5: Median Product Prices**
   - Dataset: ₦15, ₦20, ₦10, ₦25.
   - Step 1: Arrange the prices in ascending order: ₦10, ₦15, ₦20, ₦25.
   - Step 2: Since there's an even count of prices (4), the median is the average of the two middle values: Median = (₦15 + ₦20) / 2 = ₦17.50.

**Mode Examples with Solutions:**

1. **Example 1: Finding the Mode of Test Scores**
   - Dataset: 85, 92, 78, 85, 96, 88, 85.
   - Step 1: Count the frequency of each score: 85 (3 times), 92 (1 time), 78 (1 time), 96 (1 time), 88 (1 time).
   - Step 2: The mode is the value with the highest frequency, which is 85.

2. **Example 2: Mode of Customer Purchases**
   - Dataset: Laptop, Mouse, Keyboard, Laptop, Monitor, Mouse, Monitor, Mouse.
   - Step 1: Count the frequency of each purchase: Laptop (2 times), Mouse (3 times), Keyboard (1 time), Monitor (2 times).
   - Step 2: The mode is the most frequently purchased item, which is the Mouse.

3. **Example 3: Mode of Exam Letter Grades**
   - Dataset: A, B, A, C, B, A, D, C, B, A.
   - Step 1: Count the frequency of each grade: A (4 times), B (3 times), C (2 times), D (1 time).
   - Step 2: The mode is the most frequent grade, which is A.

4. **Example 4: Mode of Flower Colors**
   - Dataset: Red, Blue, Yellow, Blue, Green, Red, Red, Yellow.
   - Step 1: Count the frequency of each color: Red (3 times), Blue (2 times), Yellow (2 times), Green (1 time).
   - Step 2: The mode is the most frequently occurring color, which is Red.

5. **Example 5: Mode of Traffic Violations**
   - Dataset: Speeding, Parking, Speeding, Running Red Light, Speeding, Parking.
   - Step 1: Count the frequency of each violation: Speeding (3 times), Parking (2 times), Running Red Light (1 time).
   - Step 2: The mode is the most commonly occurring violation, which is Speeding.


## Measures of Dispersion

**Variance:**
- **Definition**: Variance measures how far individual data points are from the mean. It calculates the average of the squared differences between each data point and the mean.

- **Applications**:
  - Finance: Assessing the risk associated with investment returns.
  - Quality Control: Measuring variations in product quality.
  - Education: Analyzing variations in student test scores.
  - Health: Evaluating variations in patient recovery times.
  - Sports: Assessing the consistency of athletes' performance.

**Standard Deviation:**
- **Definition**: Standard deviation is the square root of the variance. It provides a measure of how spread out the data is. A low standard deviation indicates that data points are close to the mean, while a high standard deviation means data points are more dispersed.

- **Applications**:
  - Finance: Evaluating portfolio risk and volatility.
  - Manufacturing: Measuring variations in product dimensions.
  - Climate Science: Analyzing temperature fluctuations.
  - Social Sciences: Assessing income inequality in a population.
  - Market Research: Understanding variations in customer preferences.

Now, let's provide five examples for each measure of dispersion with step-by-step solutions:

**Variance Examples with Solutions:**

1. **Example 1: Variance of Daily Stock Returns**
   - Dataset: -2%, 1%, 0.5%, -1.5%, 3%.
   - Step 1: Calculate the mean return: (-2% + 1% + 0.5% - 1.5% + 3%) / 5 = 0.4%.
   - Step 2: Calculate the squared differences from the mean for each day: (-2.4)^2, (0.6)^2, (0.1)^2, (-1.9)^2, (2.6)^2.
   - Step 3: Calculate the average of the squared differences: Variance = (5.86 + 0.36 + 0.01 + 3.61 + 6.76) / 5 = 3.336%.

2. **Example 2: Variance of Test Scores**
   - Dataset: 85, 92, 78, 96, 88.
   - Step 1: Calculate the mean score: (85 + 92 + 78 + 96 + 88) / 5 = 87.8.
   - Step 2: Calculate the squared differences from the mean for each score: (85 - 87.8)^2, (92 - 87.8)^2, (78 - 87.8)^2, (96 - 87.8)^2, (88 - 87.8)^2.
   - Step 3: Calculate the average of the squared differences: Variance = (10.24 + 10.24 + 81.64 + 85.44 + 0.04) / 5 = 37.28.

3. **Example 3: Variance of Monthly Sales**
   - Dataset: ₦12,000, ₦10,000, ₦14,000, ₦11,000.
   - Step 1: Calculate the mean sales: (₦12,000 + ₦10,000 + ₦14,000 + ₦11,000) / 4 = ₦11,750.
   - Step 2: Calculate the squared differences from the mean for each month: (₦12,000 - ₦11,750)^2, (₦10,000 - ₦11,750)^2, (₦14,000 - ₦11,750)^2, (₦11,000 - ₦11,750)^2.
   - Step 3: Calculate the average of the squared differences: Variance = (₦625 + ₦306.25 + ₦506.25 + ₦562.5) / 4 = ₦499.375.

4. **Example 4: Variance of Student Heights**
   - Dataset: 150 cm, 155 cm, 160 cm, 148 cm, 165 cm.
   - Step 1: Calculate the mean height: (150 + 155 + 160 + 148 + 165) / 5 = 155.6 cm.
   - Step 2: Calculate the squared differences from the mean for each height: (150 - 155.6)^2, (155 - 155.6)^2, (160 - 155.6)^2, (148 - 155.6)^2, (165 - 155.6)^2.
   - Step 3: Calculate the average of the squared differences: Variance = (31.36 + 0.36 + 30.56 + 72.36 + 88.36) / 5 = 44.6 cm^2.

5. **Example 5: Variance of Delivery Times**
   - Dataset: 3 days, 4 days, 2 days, 5 days, 3 days.
   - Step 1: Calculate the mean delivery time: (3 + 4 + 2 + 5 + 3) / 5 = 3.4 days.
   - Step 2: Calculate the squared differences from the mean for each delivery time: (3 - 3.4)^2, (4 - 3.4)^2, (2 - 3.4)^2, (5 - 3.4)^2, (3 - 3.4)^2.
   - Step 3: Calculate the average of the squared differences: Variance = (0.16 + 0.36 + 2.56 + 2.56 + 0.16) / 5 = 1.36 days^2.

**Standard Deviation Examples with Solutions:**

1. **Example 1: Standard Deviation of Daily Stock Returns (Continuation from Variance Example 1)**
   - Variance: 3.336%.
   - Step 1: Calculate the standard deviation as the square root of the variance: Standard Deviation = √(3.336%) = 1.83%.

2. **Example 2: Standard Deviation of Test Scores (Continuation from Variance Example 2)**
   - Variance: 37.28.
   - Step 1: Calculate the standard deviation as the square root of the variance: Standard Deviation = √37.28 ≈ 6.11.

3. **Example 3: Standard Deviation of Monthly Sales (Continuation from Variance Example 3)**
   - Variance: ₦499.375.
   - Step 1: Calculate the standard deviation as the square root of the variance: Standard Deviation ≈ √₦499.375 ≈ ₦22.36.

4. **Example 4: Standard Deviation of Student Heights (Continuation from Variance Example 4)**
   - Variance: 44.6 cm^2.
   - Step 1: Calculate the standard deviation as the square root of the variance: Standard Deviation ≈ √44.6 cm^2 ≈ 6.68 cm.

5. **Example 5: Standard Deviation of Delivery Times (Continuation from Variance Example 5)**
   - Variance: 1.36 days^2.
   - Step 1: Calculate the standard deviation as the square root of the variance: Standard Deviation ≈ √1.36 days^2 ≈ 1.17 days.


## Data Visualization:

**Histogram:**
- **Definition**: A histogram is a graphical representation of the distribution of a dataset. It divides data into intervals (bins) and displays the frequency or count of data points within each interval.

- **Applications**:
  - Finance: Analyzing stock price distributions.
  - Healthcare: Visualizing patient age distributions.
  - Education: Displaying grade distributions in a class.
  - Marketing: Understanding customer purchase frequency.
  - Social Sciences: Examining income distributions.

**Box Plot (Box-and-Whisker Plot):**
- **Definition**: A box plot is a visual representation of the distribution of a dataset, showing the median, quartiles, and potential outliers. It provides insights into the spread and skewness of the data.

- **Applications**:
  - Quality Control: Identifying variations in product measurements.
  - Sports: Analyzing performance metrics of athletes.
  - Environment: Visualizing temperature fluctuations.
  - Medicine: Comparing patient recovery times.
  - Market Research: Examining customer satisfaction ratings.

**Scatterplot:**
- **Definition**: A scatterplot is used to visualize the relationship between two variables. Each data point is plotted on a Cartesian plane with one variable on the x-axis and the other on the y-axis.

- **Applications**:
  - Economics: Examining the correlation between income and education.
  - Environmental Science: Studying the relationship between temperature and plant growth.
  - Psychology: Analyzing the connection between stress and mental health.
  - Engineering: Visualizing the relationship between variables in a process.
  - Real Estate: Assessing the impact of square footage on property prices.

Now, let's provide five examples, including step-by-step solutions for each data visualization technique:

**Histogram Examples with Solutions:**

1. **Example 1: Age Distribution in a Survey**
   - Dataset: 25, 28, 32, 36, 28, 45, 42, 30, 28, 38, 34, 29, 27, 43, 31.
   - Step 1: Determine the range of values (minimum and maximum): Min = 25, Max = 45.
   - Step 2: Create intervals (bins): [25-30), [30-35), [35-40), [40-45).
   - Step 3: Count the frequency of data points in each interval: [25-30) = 4, [30-35) = 5, [35-40) = 2, [40-45) = 4.
   - Step 4: Create a histogram by plotting the intervals on the x-axis and frequencies on the y-axis.

2. **Example 2: Exam Score Distribution**
   - Dataset: 65, 72, 89, 77, 93, 84, 78, 60, 95, 70, 88, 82.
   - Step 1: Determine the range of values: Min = 60, Max = 95.
   - Step 2: Create intervals: [60-65), [65-70), [70-75), [75-80), [80-85), [85-90), [90-95).
   - Step 3: Count the frequency of data points in each interval: [60-65) = 1, [65-70) = 2, [70-75) = 2, [75-80) = 2, [80-85) = 3, [85-90) = 1, [90-95) = 1.
   - Step 4: Create a histogram.

3. **Example 3: Monthly Rainfall in Inches**
   - Dataset: 3.2, 1.7, 0.8, 2.5, 2.1, 0.5, 3.4, 1.2, 1.9, 2.8, 1.1, 3.0.
   - Step 1: Determine the range of values: Min = 0.5, Max = 3.4.
   - Step 2: Create intervals: [0.5-1.0), [1.0-1.5), [1.5-2.0), [2.0-2.5), [2.5-3.0), [3.0-3.5).
   - Step 3: Count the frequency of data points in each interval.
   - Step 4: Create a histogram.

4. **Example 4: Examining Website Traffic**
   - Dataset: Number of visitors in a week.
   - Create intervals based on visitor count ranges.
   - Count the frequency of data points in each interval.
   - Create a histogram to visualize the website traffic distribution.

5. **Example 5: Visualizing Product Sales**
   - Dataset: Sales figures for different products.
   - Create intervals based on sales ranges.
   - Count the frequency of data points in each interval.
   - Create a histogram to analyze product sales distribution.

**Box Plot Examples with Solutions:**

1. **Example 1: Box Plot of Test Scores**
   - Dataset: 75, 82, 90, 65, 88, 78, 70, 93, 85, 72.
   - Step 1: Order the data: 65, 70, 72, 75, 78, 82, 85, 88, 90, 93.
   - Step 2: Find the median (middle value): Median = (78 + 82) / 2 = 80.
   - Step 3: Determine the lower quartile (Q1): Q1 = (70 + 72) / 2 = 71.
   - Step 4: Determine the upper quartile (Q3): Q3 = (88 + 90) / 2 = 89.
   - Step 5: Find the interquartile range (IQR): IQR = Q3 - Q1 = 89 - 71 = 18.
   - Step 6: Identify potential outliers (values outside 1.5 times IQR).
   - Create a box plot.

2. **Example 2: Box Plot of Monthly Expenses**
   - Dataset: ₦600, ₦800, ₦700, ₦550, ₦900, ₦750, ₦680, ₦950, ₦720, ₦680.
   - Step 1: Order the data.
   - Step 2: Find the median.
   - Step 3: Determine Q1 and Q3.
   - Find IQR and identify potential outliers.
   - Create a box plot.

3. **Example 3: Box Plot of Daily Temperatures**
   - Dataset: 68°F, 72°F, 65°F, 78°F, 80°F, 66°F, 74°F, 79°F, 75°F, 72°F.
   - Step 1: Order the data.
   - Step 2: Find the median.
   - Step 3: Determine Q1 and Q3.
   - Find IQR and identify potential outliers.
   - Create a box plot.

4. **Example 4: Box Plot of Employee Salaries**
   - Dataset: ₦45,000, ₦52,000, ₦50,000, ₦47,000, ₦55,000, ₦48,000, ₦56,000, ₦53,000, ₦52,000, ₦60,000.
   - Step 1: Order the data.
   - Step 2: Find the median.
   - Step 3: Determine Q1 and Q3.
   - Find IQR and identify potential outliers.
   - Create a box plot.

5. **Example 5: Box Plot of Car Mileages**
   - Dataset: 30,000 miles, 35,000 miles, 28,000 miles, 38,000 miles, 40,000 miles, 32,000 miles, 37,000 miles.
   - Step 1: Order the data.
   - Step 2: Find the median.
   - Step 3: Determine Q1 and Q3.
   - Find IQR and identify potential outliers.
   - Create a box plot.

**Scatterplot Examples with Solutions:**

1. **Example 1: Correlation Between Study Hours and Exam Scores**
   - Dataset: Hours studied (x) and exam scores (y) for 10 students.
   - Create a scatterplot with study hours on the x-axis and exam scores on the y-axis.
   - Observe the relationship between study hours and exam scores.

2. **Example 2: Relationship Between Age and Income**
   - Dataset: Age (x) and annual income (y) for a sample of individuals.
   - Create a scatterplot to visualize the relationship between age and income.
   - Analyze whether there's a correlation between age and income.

3. **Example 3: Scatterplot of Product Price and Sales**
   - Dataset: Product prices (x) and monthly sales (y) for various products.
   - Create a scatterplot to examine the impact of product price on sales.
   - Identify trends in sales based on price.

4. **Example 4: Analyzing Customer Feedback**
   - Dataset: Customer satisfaction ratings (x) and purchase frequency (y) for a set of customers.
   - Generate a scatterplot to understand how customer satisfaction relates to purchase frequency.
   - Determine if there's a connection between satisfaction and loyalty.

5. **Example 5: Examining Temperature and Ice Cream Sales**
   - Dataset: Daily temperature (x) and ice cream sales (y) over a month.
   - Create a scatterplot to investigate the influence of temperature on ice cream sales.
   - Identify patterns in sales based on temperature variations.



## Percentiles and Quartiles

**Percentiles** and **Quartiles** are statistical measures used to divide a dataset into equal parts or to identify specific data points' relative positions within a dataset. They are often used in various fields, including education, healthcare, finance, and more.

**Percentiles**:
- **Definition**: Percentiles divide a dataset into 100 equal parts. For example, the 25th percentile (P25) represents the value below which 25% of the data falls.
- **Applications**:
  1. In educational testing, percentiles are used to compare a student's performance with a reference group.
  2. In healthcare, growth percentiles are used to track children's development.
  3. In finance, they help assess an investment's performance compared to others.
  4. In salary analysis, percentiles determine where an individual's income stands relative to a population.

**Quartiles**:
- **Definition**: Quartiles divide a dataset into four equal parts. There are three quartiles: Q1, Q2 (the median), and Q3.
- **Applications**:
  1. In box plots, quartiles help visualize the spread of data.
  2. In statistical analysis, they are used to measure data dispersion.
  3. In healthcare, quartiles help evaluate patient data.
  4. In finance, they provide insights into investment risk and return.

**Examples**:

**Percentile Example**:
Consider a dataset of 20 test scores: 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145.

1. To find the 25th percentile (P25):
   - Sort the data in ascending order.
   - P25 = (25/100) * (20 + 1) = 5.25.
   - The 25th percentile corresponds to the 5th value, which is 70.

2. To find the 75th percentile (P75):
   - P75 = (75/100) * (20 + 1) = 15.75.
   - The 75th percentile corresponds to the 16th value, which is 125.

**Quartiles Example**:
Using the same dataset of test scores, we can find the quartiles:

1. **First Quartile (Q1)**:
   - Q1 is the 25th percentile, which we found to be 70 in the previous example.

2. **Second Quartile (Q2)**:
   - Q2 is the median, which is the middle value of the dataset.
   - In this case, it's the 10th value, which is 100.

3. **Third Quartile (Q3)**:
   - Q3 is the 75th percentile, which we found to be 125 in the previous example.

These quartiles can be used to create a box plot, which visually represents the data's distribution.

Please note that in practice, statistical software or calculators are commonly used to find percentiles and quartiles for larger datasets.

