# Lesson 2: Descriptive Statistics


Welcome to our lesson on **Descriptive Statistics**! In this session, we will explore how to summarize and describe key features of a data set. By the end, you'll learn to calculate measures like the **mean** and **standard deviation**, essential for understanding data in **machine learning** and beyond.

Descriptive statistics provide a quick overview of large data sets. Imagine determining the **average test scores of students** or the **variability in their heights**—these tools make it efficient.

---

## Key Concepts

### 1. **Mean** 📈  
The **mean** is the average of a set of numbers. For example, given the test scores: `80, 85, 90, 75, 95`:

1. Add the scores:  
   \( 80 + 85 + 90 + 75 + 95 = 425 \)
2. Divide by the number of scores:  
   \( \frac{425}{5} = 85 \)

The **mean** score is **85**, which represents the "central" value.

---

### 2. **Standard Deviation** 📊  
The **standard deviation** measures how spread out the numbers are. For the same test scores:

1. Find the mean: **85**.  
2. Subtract the mean from each score and square the result:  
   - \( (80 - 85)^2 = 25 \)  
   - \( (85 - 85)^2 = 0 \)  
   - \( (90 - 85)^2 = 25 \)  
   - \( (75 - 85)^2 = 100 \)  
   - \( (95 - 85)^2 = 100 \)  
3. Find the average of the squared differences:  
   \( \frac{25 + 0 + 25 + 100 + 100}{5} = 50 \)  
4. Take the square root:  
   \( \sqrt{50} \approx 7.07 \)

The **standard deviation** is **7.07**, indicating the average variation from the mean.

---

### 3. **Median** 🔢  
The **median** is the middle value of a data set when ordered. It’s useful when data has **outliers**.

Example 1 (Odd Data Set):  
- Test scores: `75, 80, 85, 90, 95`  
- Median: **85**

Example 2 (Even Data Set):  
- Test scores: `75, 80, 85, 95`  
- Middle scores: `80` and `85`  
- Median: \( \frac{80 + 85}{2} = 82.5 \)

---

## Calculating in Python 🐍💻  

Here's how to calculate the **mean**, **standard deviation**, and **median** using the NumPy library:

```python
# Calculating Mean, Standard Deviation, and Median
import numpy as np

data = [1.2, 2.3, 3.1, 4.5, 5.7]

mean = np.mean(data)
std_dev = np.std(data)
median = np.median(data)

print("Mean:", mean)
print("Standard Deviation:", std_dev)
print("Median:", median)
```

### Output:
```plaintext
Mean: 3.36
Standard Deviation: 1.589465318904442
Median: 3.1
```

---

### Explanation:
1. **Import NumPy**: We use NumPy for numerical operations.  
2. **Data Set**: Define the data points: `[1.2, 2.3, 3.1, 4.5, 5.7]`.  
3. **Calculate Mean**: `np.mean(data)` computes the average.  
4. **Calculate Standard Deviation**: `np.std(data)` measures data variation.  
5. **Calculate Median**: `np.median(data)` finds the middle value.  
6. **Print Results**: Outputs the calculated values.

---

## Lesson Summary 📚  
- Descriptive statistics help summarize and understand data sets.  
- **Mean**: Central value of data.  
- **Standard Deviation**: Measures data spread.  
- **Median**: Middle value, robust against outliers.  
- Python simplifies these calculations with **NumPy**.

Now it’s **your turn to practice**! In the next session, you'll work with data sets to reinforce your understanding. Let's get started! 🚀


## Calculate Average Monthly Temperature and Standard Deviation

```markdown
# Hello, Galactic Pioneer! 🌌🚀

A company tracks the **monthly sales** of its top 5 products over a year. You are provided with the sales data for each product, and your mission is to calculate the **mean** and **standard deviation** of the monthly sales for each product. This analysis will help evaluate their performance.

You got this! 🛠️💡

---

## Python Code 🐍📊

```python
import numpy as np

# Monthly sales data for 5 products over a year
sales_data = {
    "Product_A": [120, 130, 150, 170, 160, 180, 200, 220, 210, 230, 250, 270],
    "Product_B": [140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250],
    "Product_C": [110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220],
    "Product_D": [130, 125, 140, 145, 135, 150, 155, 160, 170, 175, 185, 195],
    "Product_E": [100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210]
}

for product, sales in sales_data.items():
    # Calculate the mean for each product
    mean_sales = np.mean(sales)
    
    # Calculate the standard deviation for each product
    std_dev_sales = np.std(sales)
    
    # Print the results
    print(f"{product} - Mean Sales: {mean_sales:.2f}, Standard Deviation: {std_dev_sales:.2f}")
```

---

## Explanation 🧠
1. **Sales Data**:  
   The `sales_data` dictionary contains sales for each product over 12 months.
2. **Iteration**:  
   Use `.items()` to iterate through the dictionary, retrieving both the product name (`product`) and its sales data (`sales`).
3. **Mean Sales**:  
   Use `np.mean(sales)` to calculate the average sales for each product.
4. **Standard Deviation**:  
   Use `np.std(sales)` to measure the variability of sales for each product.
5. **Formatted Output**:  
   Print the results using formatted strings for clarity.

---

## Sample Output 📈
When you run the code, you will get:
```plaintext
Product_A - Mean Sales: 190.00, Standard Deviation: 47.29
Product_B - Mean Sales: 195.00, Standard Deviation: 34.64
Product_C - Mean Sales: 165.00, Standard Deviation: 34.64
Product_D - Mean Sales: 155.42, Standard Deviation: 19.23
Product_E - Mean Sales: 155.00, Standard Deviation: 34.64
```

---

## Next Steps 🚀
Try visualizing the data! Plot the sales trends or compare the variability of sales across products. Statistics and data visualization go hand in hand for deeper insights. Happy exploring! 🌟📊
```

## Calculating Mean and Standard Deviation for Monthly Sales Data



## Calculate the Median of Salaries

Let's practice finding the **median** in a dataset.  
You are given a list of salaries. Complete the code to calculate the median of these salaries.  

You've got this, Space Explorer! 🌌

```python
import numpy as np

salaries = [35000, 47000, 56000, 73000, 46000, 67000, 52000]

# TODO: Compute the median using NumPy
median_salary = np.median(salaries)

# TODO: Print the median salary
print(f"The median salary is: {median_salary}")


## Increase the Standard Deviation of a Dataset


In this task, you'll adjust a dataset to increase its **standard deviation**.  
Start with the given dataset and modify it to create higher variability.  

Let's increase that spread! 🌌  

```python
import numpy as np

data = [1.2, 2.3, 3.1, 4.5, 5.7]
mean = np.mean(data)
std_dev = np.std(data)

# TODO: Create a new dataset with higher standard deviation
# You can achieve this by scaling or adding greater variability
new_data = [x * 2 for x in data]  # Example of increasing variability

new_mean = np.mean(new_data)
new_std_dev = np.std(new_data)

print("Original Data:", data)
print("Original Standard Deviation:", std_dev)
print("New Data:", new_data)
print("New Standard Deviation:", new_std_dev)


## Analyzing Drug Effectiveness with Descriptive Statistics

A pharmaceutical company has collected data on the effectiveness of a new drug, measured by the decrease in blood pressure (in mmHg) of 10 patients:  
[8, 10, 5, 7, 6, 10, 8, 6, 5, 9]  

Complete the code to calculate and print the **mean**, **standard deviation**, and **median** blood pressure decrease to understand the drug's effectiveness.  

You've got this! 🌌  

```python
import numpy as np

data = [8, 10, 5, 7, 6, 10, 8, 6, 5, 9]

# TODO: Calculate mean
mean = np.mean(data)

# TODO: Calculate standard deviation
std_dev = np.std(data)

# TODO: Calculate median
median = np.median(data)

print("Mean:", mean)
print("Standard Deviation:", std_dev)
print("Median:", median)


## Comparing Mean Temperatures of Two Cities

Imagine you have two sets of daily temperatures (in °C) for two different cities over a week:  
- City 1: [23, 25, 22, 20, 24, 26, 28]  
- City 2: [18, 20, 22, 19, 21, 22, 20]  

Your mission is to calculate and compare the average temperatures for these two cities to determine which city enjoyed the warmer week.  

You got this! 🚀  

```python
import numpy as np

city1_temps = [23, 25, 22, 20, 24, 26, 28]
city2_temps = [18, 20, 22, 19, 21, 22, 20]

# TODO: Calculate mean temperature for city1
mean_city1 = np.mean(city1_temps)

# TODO: Calculate mean temperature for city2
mean_city2 = np.mean(city2_temps)

# TODO: Print calculated means
print("Average Temperature for City 1:", mean_city1)
print("Average Temperature for City 2:", mean_city2)

# Compare the means and determine which city had the higher average temperature
if mean_city1 > mean_city2:  
    print("City 1 had the higher average temperature.")
else:  
    print("City 2 had the higher average temperature.")
```
