# Week 5: Grouping and Aggregating Data


## Objectives:
In this week, you will:
1. Learn how to group data based on categories or conditions.
2. Perform aggregations such as sums, means, and counts on grouped data.
3. Use advanced aggregation functions for custom analyses.




## 1. Introduction to Grouping Data
Grouping data is an important step when you need to analyze or summarize data based on categories. For example, you may want to group visitor data by location or by date.

### Common Functions:
- `groupby()`: Group the data by one or more columns.
- Aggregation functions: `sum()`, `mean()`, `count()`, etc.

Let's start by loading a dataset and grouping the data by location to calculate the total number of visitors.


In [1]:

# Import pandas and load a sample dataset
import pandas as pd

# Sample dataset
data = {
    'Location': ['Park A', 'Museum B', 'Beach C', 'Park A', 'Museum B'],
    'Visitors': [200, 150, 100, 300, 180],
    'Revenue': [1000, 750, 500, 1500, 900]
}

df = pd.DataFrame(data)

# Group data by 'Location' and calculate the total number of visitors
df_grouped = df.groupby('Location')['Visitors'].sum()

# Show the result
df_grouped


Location
Beach C     100
Museum B    330
Park A      500
Name: Visitors, dtype: int64


## 2. Aggregating Multiple Columns
You can perform aggregations on multiple columns at the same time. For example, you might want to calculate the total visitors and the total revenue for each location.

### Example:
Let's group the data by location and calculate both total visitors and total revenue.


In [2]:

# Group by 'Location' and calculate total visitors and total revenue
df_agg = df.groupby('Location').agg({
    'Visitors': 'sum',
    'Revenue': 'sum'
})

# Show the result
df_agg


Unnamed: 0_level_0,Visitors,Revenue
Location,Unnamed: 1_level_1,Unnamed: 2_level_1
Beach C,100,500
Museum B,330,1650
Park A,500,2500



## 3. Applying Custom Aggregation Functions
In addition to basic aggregation functions like `sum()` and `mean()`, you can apply custom aggregation functions to perform more complex analyses.

### Example:
Let's apply a custom function to calculate the average revenue per visitor for each location.


In [3]:

# Define a custom function to calculate average revenue per visitor
def avg_revenue_per_visitor(x):
    return x['Revenue'].sum() / x['Visitors'].sum()

# Apply the custom aggregation function
df_custom = df.groupby('Location').apply(avg_revenue_per_visitor)

# Show the result
df_custom


Location
Beach C     5.0
Museum B    5.0
Park A      5.0
dtype: float64


## 4. Grouping by Multiple Columns
You can also group data by multiple columns to perform more detailed analyses. For example, you might want to group by both location and date to analyze trends over time.

### Example:
Let's group the data by both location and visitors to see how the results change.


In [4]:

# Group by 'Location' and 'Visitors'
df_multi_group = df.groupby(['Location', 'Visitors']).sum()

# Show the result
df_multi_group


Unnamed: 0_level_0,Unnamed: 1_level_0,Revenue
Location,Visitors,Unnamed: 2_level_1
Beach C,100,500
Museum B,150,750
Museum B,180,900
Park A,200,1000
Park A,300,1500



## 5. Summary
This week, you learned how to:
1. Group data using `groupby()` and apply basic aggregation functions.
2. Perform custom aggregation functions for more detailed analysis.
3. Group data by multiple columns for complex insights.

### Homework:
- Practice grouping and aggregating data on a new dataset.
- Experiment with custom aggregation functions and see how they can help with your analysis.

