# Summarizing Grouped Data

> A quote about summarizing grouped data
>
> \- The person that said the quote

## Applied Review

### DataFrame Structure

* We will start by importing the `planes` data set as a DataFrame:

In [1]:
import pandas as pd
planes_df = pd.read_csv('../data/planes.csv')

* Each DataFrame variable is a **Series** and can be accessed with bracket subsetting notation: `DataFrame['SeriesName']`
* The DataFrame has an **Index** that is visible the far left side and can be used to slide the DataFrame

### Summary Operations

* Summary operations occur when we collapse a Series or DataFrame down to a single row
* This is an aggregation of a variable across its rows

![aggregate-series.png](images/aggregate-series.png)

### Summarizing Data Frames

* We can perform summary operations on DataFrames in a number of ways:
  * Summary methods for a specific summary operation: `python DataFrame.sum()`
  * Describe method for a collection of summary operations: `python DataFrame.describe()`
  * Agg method for flexibility in summary operations: `python DataFrame.agg({'VariableName': ['sum', 'mean']})`
  
  
* An example of the agg method:

In [2]:
planes_df.agg({
    'year': ['mean', 'median'],
    'seats': ['mean', 'max']
})

Unnamed: 0,year,seats
max,,450.0
mean,2000.48401,154.316376
median,2001.0,


* We will primarily use the agg method moving forward.

## General Model

### Variable Groups

* We can group DataFrame rows together by the value in a Series/variable:

![dataframe-groups.png](images/dataframe-groups.png)

* If we "group by A", then rows with the same value in variable A are in the same group
* Note that groups do not need to be ordered by their values:

![dataframe-groups-unordered.png](images/dataframe-groups-unordered.png)

<font style="color:#008;">
    <strong>Question</strong>:<br><em>Why might we be interested in grouping by a variable?</em>
</font>

### Summarizing by Groups

* When we've talked about **summary** opertions, we've talked about collapsing a DataFrame to a single row
* This is not always the case -- we sometimes collapse to a *single row per group*
* This is known as a grouped aggregation:

![summarizing-by-groups.png](images/summarizing-by-groups.png)

* This can be useful when we want to find do an aggregation by cateogory:
  * Maximum temperature *by month*
  * Total home runs *by team*
  * Average number of seats by plane model

## Summarizing Grouped Data

### Groups as Index

### Groups as Variables