# Lesson 4: Introduction to Pivot Tables

Here's your text formatted in Markdown for a lesson on pivot tables:

```markdown
# Lesson Introduction

Pivot tables are powerful tools that allow us to summarize, analyze, and explore data in different ways. They are commonly used in data analysis for generating insights from data by arranging, sorting, and aggregating it.

In this lesson, we will learn how to create and use pivot tables in Pandas. By the end of this lesson, you will understand the basics of pivot tables, how to create them, and how they can help you extract valuable insights from your data.

## Pivot Table Basics

A pivot table allows us to summarize data by grouping it in a way that makes it easier to extract meaningful insights. Think of it like organizing toys in a toy store. Instead of having all toys mixed up, you sort them by category and then further by different attributes. Pivot tables help us do something similar with our data.

The Pandas library provides a function called `pivot_table()` that makes creating pivot tables in Python straightforward. One reason pivot tables are so useful is that they allow us to easily perform aggregate functions like mean, sum, and count on data.

Here are the important parameters of the `pivot_table()` function:

- **index**: The column(s) to group by, like aisles in a store.
- **columns**: The column whose distinct values will form the columns of the pivot table.
- **values**: The columns containing the data you want to aggregate, like toy prices.
- **aggfunc**: The function used to aggregate the data (e.g., mean or sum).

## Creating a Simple Pivot Table

Let's start by creating a simple pivot table. Suppose you have data about different products, and you want to see the average price of each product category:

```python
import pandas as pd

# Sample DataFrame
data = {
    'Product': ['Toy', 'Toy', 'Book', 'Book', 'Electronic'],
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Price': [10, 15, 7, 12, 100]
}
df = pd.DataFrame(data)

# Create a pivot table
pivot_table = df.pivot_table(index='Product', values='Price', aggfunc='mean')
print(pivot_table)
```

**Output:**

```
             Price
Product            
Book            9.5
Electronic    100.0
Toy            12.5
```

This code groups the data by the `Product` column and calculates the average price for each product. Running this shows the average price for Toy, Book, and Electronic.

## Complex Pivot Table Example

Let's consider a complex example. We will still use the Titanic dataset but focus on more dimensions. We will analyze the average fare and survival rate based on the class of the passengers and their gender.

```python
import seaborn as sns

# Load Titanic dataset using seaborn
titanic = sns.load_dataset('titanic')

# Create a simplified pivot table
pivot = titanic.pivot_table(index='class', columns='sex', values=['survived', 'fare'], aggfunc='mean')
print(pivot)
```

**Output:**

```
             fare             survived         
sex        female      male    female      male
class                                        
First    106.125798  67.226127  0.968085  0.368852
Second    21.970121  19.741782  0.921053  0.157407
Third     16.118810  12.661633  0.500000  0.135447
```

### Detailed Explanation of the Complex Example

This pivot table shows the average fare and survival rate for passengers of different classes and genders.

Let's break down the arguments:

- **Index**: The rows of the table (indexed by class).
- **Columns**: The columns (grouped by sex).
- **Values**: Data being summarized (fare and survived).

Here is how to interpret the result:

- The class index groups the data by the passenger class (First, Second, Third).
- The sex columns categorize data by gender (female, male).
- Under fare, we can see the average fare paid by female and male passengers in each class.
- Under survived, we see the average survival rate for female and male passengers in each class.

## Example Insights

Let's consider insights we could find using this table.

- The average fare for female passengers in the First Class is 106.13, and for male – 67.23.
- The survival rate for female passengers in the First Class is significantly higher than for male, it is 96.81% vs 36.89%.
- In the Third Class, only 50% of female passengers survived.

By breaking the data down this way, you can uncover meaningful insights and trends that may not be immediately obvious in a more granular or ungrouped dataset.

## Lesson Summary

Great job! We've covered a fundamental aspect of data analysis: pivot tables. Let's recap what we've learned:

- What pivot tables are and why they are useful.
- How to create a basic pivot table using Pandas.
- How to create a more complex pivot table with multiple dimensions and aggregate functions.
- How to interpret the data from a pivot table.

Using pivot tables, you can summarize and analyze large amounts of data efficiently, helping you draw meaningful insights with ease.

Now it's time to put what you've learned into practice! In the upcoming practice session, you will get hands-on experience creating and working with pivot tables. This will solidify your understanding and prepare you for more advanced data analysis tasks. Let's dive in and start practicing with pivot tables!
```

This Markdown format provides a structured and detailed lesson on pivot tables, making it easy to follow and understand for learners.

## Analyzing Toy Prices with Pivot Tables

The Python code you've provided is correctly set up to use a pivot table to find the average price of different toy types in a toy store. Here's a breakdown of how the code works and what it accomplishes:

1. **Importing the Pandas Library**: The script begins by importing the `pandas` library, which is essential for data manipulation in Python.

2. **Creating a DataFrame**: A DataFrame `df` is created from a dictionary `data`. This dictionary includes three keys: 'Product', 'Type', and 'Price'. Each key corresponds to a list of values representing different attributes of toys. All entries under 'Product' are labeled 'Toy', indicating that the data pertains only to toys. The 'Type' key differentiates between 'Car' and 'Doll'.

3. **Creating a Pivot Table**:
   - **Index**: The pivot table is indexed on the 'Type' column, which includes toy types ('Car' and 'Doll'). This means the resulting table will have rows labeled with these toy types.
   - **Values**: The 'Price' column is used as the values that will be aggregated. This column contains the prices of the toys.
   - **Aggfunc**: The aggregation function used is `mean`, which calculates the average price for each type of toy.

4. **Printing the Pivot Table**: The pivot table is printed to the console, showing the average price for each toy type.

The output of this code will display a simple table with each toy type and its corresponding average price, making it easy to see the average cost of each type of toy in the store.

Here's what the output might look like:

```
       Price
Type        
Car     12.5
Doll    22.5
```

This output indicates that the average price of 'Car' toys is 12.5 and 'Doll' toys is 22.5. This information is valuable for understanding pricing strategies or inventory decisions in a toy store.

## Change the Aggregation Function

Space Explorer, let's tweak the pivot table to sum the total prices of each toy instead of averaging. Change the aggregation function in the pivot_table function.

```py
import pandas as pd

# Toys data in toy store
data = {
    'Toy': ['Car', 'Doll', 'Car', 'Puzzle', 'Doll', 'Puzzle'],
    'Type': ['Vehicle', 'Figure', 'Vehicle', 'Game', 'Figure', 'Game'],
    'Price': [15, 25, 10, 20, 30, 18]
}
df = pd.DataFrame(data)

# Create a pivot table
pivot_table = df.pivot_table(index='Toy', values='Price', aggfunc='mean')
print(pivot_table)

```

To modify the existing pivot table to sum the total prices of each toy type rather than averaging them, you'll need to change the aggregation function from `mean` to `sum` in the `pivot_table` function. Here's how you can adjust your code:

```python
import pandas as pd

# Toys data in toy store
data = {
    'Toy': ['Car', 'Doll', 'Car', 'Puzzle', 'Doll', 'Puzzle'],
    'Type': ['Vehicle', 'Figure', 'Vehicle', 'Game', 'Figure', 'Game'],
    'Price': [15, 25, 10, 20, 30, 18]
}
df = pd.DataFrame(data)

# Create a pivot table to sum the total prices of each toy
pivot_table = df.pivot_table(index='Toy', values='Price', aggfunc='sum')
print(pivot_table)
```

### Explanation:
- **Index**: The pivot table is indexed on the 'Toy' column, which includes different types of toys ('Car', 'Doll', 'Puzzle').
- **Values**: The 'Price' column is used as the values that will be aggregated.
- **Aggfunc**: The aggregation function `sum` is used here to calculate the total price for each type of toy.

This modification will result in a pivot table that provides the total price of each toy type, which is useful for inventory valuation or sales analysis in a toy store.

Here's what the expected output might look like:

```
        Price
Toy          
Car        25
Doll       55
Puzzle     38
```

This output indicates the total prices for each toy type, showing that 'Car' toys have a total price of 25, 'Doll' toys total 55, and 'Puzzle' toys add up to 38. This information can help in assessing the financial contribution of each toy category to the store's revenue.

## Toy Store Sales Pivot Table

Great job so far! Now, let's continue our cosmic journey.

Add the necessary code to create the pivot table that shows the sum of toy sales by toy type and store.

```py
import pandas as pd

# Sample DataFrame with toy data
data = {
    'Toy': ['Car', 'Doll', 'Blocks', 'Car', 'Blocks'],
    'Store': ['A', 'A', 'B', 'B', 'A'],
    'Sales': [100, 150, 200, 50, 300]
}
df = pd.DataFrame(data)

# TODO: create a pivot table that summarizes the sum of toy sales by toy type and store.
print(pivot_table)


```

To create a pivot table that shows the sum of toy sales by toy type and store, you'll need to adjust the `pivot_table` function to include both the 'Toy' and 'Store' columns as indices, and then aggregate the 'Sales' values using the sum function. Here's how you can complete the code:

```python
import pandas as pd

# Sample DataFrame with toy data
data = {
    'Toy': ['Car', 'Doll', 'Blocks', 'Car', 'Blocks'],
    'Store': ['A', 'A', 'B', 'B', 'A'],
    'Sales': [100, 150, 200, 50, 300]
}
df = pd.DataFrame(data)

# Create a pivot table that summarizes the sum of toy sales by toy type and store
pivot_table = df.pivot_table(index=['Toy', 'Store'], values='Sales', aggfunc='sum')
print(pivot_table)
```

### Explanation:
- **Index**: The pivot table uses both 'Toy' and 'Store' as indices. This means the resulting table will have rows labeled with combinations of toy types and store identifiers, providing a detailed breakdown of sales.
- **Values**: The 'Sales' column is used as the values that will be aggregated.
- **Aggfunc**: The aggregation function `sum` is used to calculate the total sales for each combination of toy type and store.

This setup allows you to see how much each type of toy sold in each store, which is very useful for analyzing sales performance across different locations.

### Expected Output:
The output will display the total sales for each toy type in each store, something like this:

```
               Sales
Toy    Store        
Blocks A       300
       B       200
Car    A       100
       B        50
Doll   A       150
```

This table shows how much each toy type sold in each respective store, providing clear insights into the sales distribution across different products and locations.

Great job analyzing toy store data so far, Space Voyager! Now, let's create a pivot table that shows the average (mean) price and quantity of products grouped by category and product type.

```py
import pandas as pd

# Sample DataFrame
data = {
    'Product': ['Toy', 'Toy', 'Book', 'Book', 'Electronic', 'Toy', 'Book'],
    'Category': ['A', 'B', 'A', 'B', 'A', 'A', 'A'],
    'Price': [10, 15, 7, 12, 100, 8, 9],
    'Quantity': [1, 2, 3, 4, 1, 5, 2]
}
df = pd.DataFrame(data)

# TODO: Create a pivot table that shows the average (mean) price and quantity of products grouped by category and product type
print(pivot_table)

```

To create a pivot table that displays the average (mean) price and quantity of products, grouped by both category and product type, you'll need to modify the pivot_table function in your code. Here's how you can set up the pivot table correctly:

```python
import pandas as pd

# Sample DataFrame
data = {
    'Product': ['Toy', 'Toy', 'Book', 'Book', 'Electronic', 'Toy', 'Book'],
    'Category': ['A', 'B', 'A', 'B', 'A', 'A', 'A'],
    'Price': [10, 15, 7, 12, 100, 8, 9],
    'Quantity': [1, 2, 3, 4, 1, 5, 2]
}
df = pd.DataFrame(data)

# Create a pivot table that shows the average (mean) price and quantity of products grouped by category and product type
pivot_table = df.pivot_table(index=['Category', 'Product'], values=['Price', 'Quantity'], aggfunc='mean')

# Print the pivot table
print(pivot_table)
```

### Explanation:
- **Index**: The pivot table uses 'Category' and 'Product' as indices. This groups the data first by category and then within each category by product type.
- **Values**: The 'Price' and 'Quantity' columns are used as the values to be aggregated.
- **Aggfunc**: The aggregation function `mean` is specified to calculate the average for both price and quantity.

### What This Does:
This pivot table configuration will provide the average price and quantity for each product within each category. It's a useful analysis tool for understanding how different types of products perform across various categories in terms of both pricing and sales volume.

### Expected Output:
The output will likely look something like this, showing the average price and quantity for each combination of product and category:

```
                  Price  Quantity
Category Product                 
A        Book       8.0       2.5
         Electronic 100.0      1.0
         Toy        9.0       3.0
B        Book       12.0      4.0
         Toy        15.0      2.0
```

This table makes it easy to compare the average price and quantity sold for each product type across different categories, providing clear insights into product performance and customer preferences.