# Alternate GroupBy Syntax

This notebook covers alternative syntax with the **`groupby`** method. The purpose of this notebook is to show other syntaxes that you might see in the wild that accomplish the same exact task. This notebook has great potential to confuse beginning pandas users since these methods do not give you any extra power to do data analysis, just aggregate in a different manner.

In [None]:
import pandas as pd
import numpy as np

# Use City of Houston Employee Data
Read in employee data and add a column for years of experience.

In [None]:
emp = pd.read_csv('../data/employee.csv', parse_dates=['HIRE_DATE', 'JOB_DATE'])
emp['EXPERIENCE'] = 2016 - emp['HIRE_DATE'].dt.year
emp.head()

# Grouping a single column, aggregating a single column, applying a single function
Originally taught:

In [None]:
emp.groupby('RACE').agg({'BASE_SALARY': 'mean'})

# Alternative
You can select the aggregating column with the brackets and the aggregating function as a string to the **`agg`** method.

In [None]:
emp.groupby('RACE')['BASE_SALARY'].agg('sum')

You can even bypass the **`agg`** method and call the **`sum`** 

In [None]:
emp.groupby('RACE')['BASE_SALARY'].sum()

# Multiple aggregation functions
Original:

In [None]:
emp.groupby('RACE').agg({'BASE_SALARY': ['mean', 'sum']})

# Alternative

In [None]:
emp.groupby('RACE')['BASE_SALARY'].agg(['mean', 'sum'])

# Multiple Grouping, Aggregating, and Applying same Functions
This only works if you are applying the same functions to each aggregating column.

In [None]:
emp.groupby(['RACE', 'GENDER']).agg({'BASE_SALARY': ['min', 'max', 'mean', 'median'],
                                     'EXPERIENCE': ['min', 'max', 'mean', 'median']})

In [None]:
emp.groupby(['RACE', 'GENDER'])['BASE_SALARY', 'EXPERIENCE'].agg(['min', 'max', 'mean', 'median'])

# Alternative - No Aggregating Columns
You actually do not need to specify the aggregating columns when grouping. Pandas will silently drop the columns that don't work for the particular aggregation method. For instance, only numeric columns have a mean. All other columns will be dropped. The only numeric columns are salary and experience.

In [None]:
emp.groupby(['RACE', 'GENDER']).agg(['min', 'max', 'mean', 'median'])

You can even call a method directly after grouping to apply it to all columns.

In [None]:
emp.groupby(['RACE', 'GENDER']).mean()

The **`count`** method works for all columns not just numeric.

In [None]:
emp.groupby(['RACE', 'GENDER']).count()