<center> <img src="res/ds3000.png"> </center>

<center> <h1> Week 6 - Day 2 </h1> </center>

<center> <h2> Part 3: Data Aggregation </h2></center>

## Outline
1. <a href='#1'>Data Aggregation Using GroupBy</a>
2. <a href='#2'>Iterating through a GroupBy Object</a>
3. <a href='#3'>Grouping by Index Levels</a>
4. <a href='#4'>Grouping by Multiple Indices</a>
5. <a href='#5'>Aggregation Methods for GroupBy Objects</a>

<a id="1"></a>

## 1. Data Aggregation Using GroupBy

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("res/ave_grades.csv")

In [None]:
df

In [None]:
df = df.set_index(["House", "Student"])

In [None]:
df.loc["Gryffindor"].mean()

### 1.1. groupby() method
* **groupby()** method aggregates DataFrame rows by a column
    * Returns an iterable GroupBy object that stores the data aggregated according to the grouping column

In [None]:
df.groupby("House")

In [None]:
df.groupby("House").mean()

In [None]:
pd.set_option("precision", 2)

### 1.2. Indexing a GroupBy Object
* Similar to indexing a column in a DataFrame

In [None]:
df.groupby("House")["Potion_Ave"].mean()

<a id="2"></a>

## 2. Iterating through a GroupBy Object
* groupby() returns an iterable sequence of 2-tuples containing 
    * the group name and 
    * the segment of the dataframe corresponding to that group

In [None]:
for house, frame in df.groupby("House"):
    print(frame)
    print("\n")
    

In [None]:
for house, frame in df.groupby("House"):
    print(house)
    print("\tPotions Ave: ", frame["Potion_Ave"].mean())
    print("\tCharms Ave: ", frame["Charm_Ave"].mean())
    print("\n")
    

In [None]:
df

<a id="3"></a>

## 3. Grouping by Index Levels
* For hierarchically indexed dataframes, it is possible to aggregate using ofe the levels of an axis index.
* Specify the index name, or level name, using the **level** keyword.

In [None]:
df

In [None]:
df.groupby(level = "House").mean()

<a id="4"></a>

## 4. Grouping by Multiple Indices
* Can group a dataframe by multiple columns

In [None]:
df_player = pd.read_csv("res/ave_grades_quidditch.csv")

In [None]:
df_player.sort_values(by="House")

In [None]:
grouped_df = df_player.groupby(["House", "Quidditch"])

In [None]:
grouped_df.mean()

<a id="5"></a>

  
## 5. Aggregation Methods for GroupBy Objects
* Can call aggregation methods on GroupBy objects using the **agg()** method
* **groupby_object.agg(["list of aggregation methods"])**

In [None]:
grouped_df.agg("mean")

In [None]:
grouped_df.agg(["count", "mean", "std"])

#### How would you modify the previous code snippet to get the same stats, count, mean, std, for Quidditch players vs. non-players

In [None]:
df_player.groupby("Quidditch").agg(["count", "mean", "std"])

### 5.1. Custom Aggregation Functions
* Can define and apply your custom functions to DataFrame rows/columns using the **agg()** method

In [None]:
def letter_grade (grade):
    '''simple function that takes a Series object containing grades and 
    returns the letter grade for the average of these grades
    '''
    if grade.mean() > 80:
        return "B"
    else:
        return "C+"

In [None]:
df_player.groupby("House").agg(letter_grade)

In [None]:
df_player.groupby("Quidditch").agg(["count", "mean", "std", letter_grade])

### 5.2. Applying Different Functions to Different Columns
* Pass in a dictionary of column names and function names

In [None]:
def pass_fail_grade (grade):
    '''simple function that takes a Series object containing grades and 
    returns the letter grade for the average of these grades
    '''
    if grade.mean() > 65:
        return "Pass"
    else:
        return "Fail"

In [None]:
df_player.groupby(["House", "Quidditch"]).agg({"Potion_Ave": letter_grade, "Charm_Ave": pass_fail_grade})