# Grouping by an Amount of Time
Grouping by date is a powerful built-in feature of pandas. The **`resample`** method works almost exactly like **`groupby`** except that you pass it an amount of time of each group. You select the amount of time with an **offset alias**. You must also select the column that has the datetime in it with the **`on`** method. If you leave the **`on`** parameter blank it will attempt to use the index and error if that's not a datetime.

The example below shows how to group by the column **`HIRE_DATE`** in increments of 10 years.

In [2]:
import pandas as pd
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date', 'job_date'])

In [3]:
emp.resample('10A', on='hire_date').agg({'salary': ['mean', 'size']})

Unnamed: 0_level_0,salary,salary
Unnamed: 0_level_1,mean,size
hire_date,Unnamed: 1_level_2,Unnamed: 2_level_2
1958-12-31,81239.0,1
1968-12-31,89590.0,1
1978-12-31,85376.142857,26
1988-12-31,68083.643312,212
1998-12-31,62567.805556,437
2008-12-31,56660.056923,653
2018-12-31,46806.263323,670


### Offset alias
An offset alias is a short string that pandas uses to determine the time interval group. See the scraped documentation below for a list of offset aliases that are available to you.

In [4]:
df_list = pd.read_html('http://pandas.pydata.org/pandas-docs/stable/timeseries.html', 'Alias')
df_offset_alias = df_list[0]
df_offset_alias

Unnamed: 0,Alias,Description
0,B,business day frequency
1,C,custom business day frequency
2,D,calendar day frequency
3,W,weekly frequency
4,M,month end frequency
5,SM,semi-month end frequency (15th and end of month)
6,BM,business month end frequency
7,CBM,custom business month end frequency
8,MS,month start frequency
9,SMS,semi-month start frequency (1st and 15th)


Precede offset aliases by an integer (in the string) to signify a multiple of that time period. In our example above **10A** signified 10 years. Some other examples:
* **5W** - 5 weeks
* **15H** - 15 hours
* **6Q** - 6 quarters

You can append some of the aliases with **S** which will use the start of the time period. By default, the end of the time period is used. In the example above, the first group consists of employees hired from Jan 1, 1949 to Dec 31, 1958. The first group in the example below consists of employees hired from Jan 1, 1958 to Dec 31, 1967.

In [5]:
emp.resample('10AS', on='hire_date').agg({'salary': ['mean', 'size']})

Unnamed: 0_level_0,salary,salary
Unnamed: 0_level_1,mean,size
hire_date,Unnamed: 1_level_2,Unnamed: 2_level_2
1958-01-01,81239.0,1
1968-01-01,106477.333333,19
1978-01-01,69560.486301,204
1988-01-01,62301.624079,411
1998-01-01,58195.369932,596
2008-01-01,47236.630936,769
2018-01-01,,0


## Grouping by an Amount of Time and Another Column

It's possible to group by an amount of time and another column. Instead of using **`resample`** use **`groupby`**. The syntax for this is strange. You will need to use the **`pd.Grouper`** function to handle the time grouping. Use the **`key`** parameter to denote the datetime column you want to group by and **`freq`** for the amount of time (given with an offset-alias string).

Let's create our time grouped object below and assign it to a variable. This object does nothing on its own.

In [6]:
tg = pd.Grouper(key='hire_date', freq='10AS')

### Use the Time Grouped Object as a grouping column
In the **`groupby`** method, use **`tg`** just as you would as if it were another grouping column. Put it in a list along with all the other grouping columns. Then call **`agg`** as you normally would.

In [7]:
emp.groupby([tg, 'gender']).agg({'salary': ['mean', 'size']})

Unnamed: 0_level_0,Unnamed: 1_level_0,salary,salary
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,size
hire_date,gender,Unnamed: 2_level_2,Unnamed: 3_level_2
1958-01-01,Male,81239.0,1
1968-01-01,Female,,1
1968-01-01,Male,106477.333333,18
1978-01-01,Female,57072.461538,32
1978-01-01,Male,72266.225,172
1988-01-01,Female,57117.769841,127
1988-01-01,Male,64626.05694,284
1998-01-01,Female,54738.440678,180
1998-01-01,Male,59669.771084,416
2008-01-01,Female,47305.262097,263


## Pivoting for more readability
You can use this time grouped object in a pivot table to get a different result shape.

In [8]:
emp.pivot_table(index=tg, columns='gender', values='salary', aggfunc='mean')

gender,Female,Male
hire_date,Unnamed: 1_level_1,Unnamed: 2_level_1
1958-01-01,,81239.0
1968-01-01,,106477.333333
1978-01-01,57072.461538,72266.225
1988-01-01,57117.769841,64626.05694
1998-01-01,54738.440678,59669.771084
2008-01-01,47305.262097,47201.824131
