On March 13, 2016, version 0.18.0 of Pandas was released, with significant changes in how the resampling function operates. This tutorial follows v0.18.0 and will not work for previous versions of pandas.

First let's load the modules we care about

## Preliminaries

In [9]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity='all'

In [1]:
# Import required packages
import pandas as pd
import datetime
import numpy as np

Next, let's create some sample data that we can group by time as an sample. In this example I am creating a dataframe with two columns with 365 rows. One column is a date, the second column is a numeric value.

## Create Data

In [2]:
# Create a datetime variable for today
base = datetime.datetime.today()
# Create a list variable that creates 365 days of rows of datetime values
date_list = [base - datetime.timedelta(days=x) for x in range(0, 365)]

In [20]:
base
# d=datetime.timedelta(days=365)
# [datetime.timedelta(days=x) for x in range(0, 365)]

datetime.datetime(2019, 6, 26, 23, 11, 11, 25526)

In [3]:
# Create a list variable of 365 numeric values
score_list = list(np.random.randint(low=1, high=1000, size=365))

In [23]:
# Create an empty dataframe
df = pd.DataFrame()

# Create a column from the datetime variable
df['datetime'] = date_list
# Convert that column into a datetime datatype
# df['datetime'] = pd.to_datetime(df['datetime'])
df['datetime'] = pd.to_datetime(date_list)
# Set the datetime column as the index
df.index = df['datetime'] 
# Create a column from the numeric score variable
df['score'] = score_list

In [22]:
# Let's take a took at the data
df.head()
set(df.index.month)

Unnamed: 0_level_0,datetime,score
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-06-26 23:11:11.025526,2019-06-26 23:11:11.025526,609
2019-06-25 23:11:11.025526,2019-06-25 23:11:11.025526,392
2019-06-24 23:11:11.025526,2019-06-24 23:11:11.025526,817
2019-06-23 23:11:11.025526,2019-06-23 23:11:11.025526,775
2019-06-22 23:11:11.025526,2019-06-22 23:11:11.025526,301


{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

## Group Data By Date

In pandas, the most common way to group by time is to use the .resample() function. In v0.18.0 this function is two-stage. This means that 'df.resample('M')' creates an object to which we can apply other functions ('mean', 'count', 'sum', etc.)

In [6]:
# Group the data by month, and take the mean for each group (i.e. each month)
df.resample('M').mean()

Unnamed: 0_level_0,score
datetime,Unnamed: 1_level_1
2018-06-30,511.0
2018-07-31,430.322581
2018-08-31,487.870968
2018-09-30,556.833333
2018-10-31,485.548387
2018-11-30,478.0
2018-12-31,472.580645
2019-01-31,490.258065
2019-02-28,427.214286
2019-03-31,560.483871


In [7]:
# Group the data by month, and take the sum for each group (i.e. each month)
df.resample('M').sum()

Unnamed: 0_level_0,score
datetime,Unnamed: 1_level_1
2018-06-30,2044
2018-07-31,13340
2018-08-31,15124
2018-09-30,16705
2018-10-31,15052
2018-11-30,14340
2018-12-31,14650
2019-01-31,15198
2019-02-28,11962
2019-03-31,17375


## Grouping Options

There are many options for grouping. You can learn more about them in [Pandas's timeseries docs](http://pandas.pydata.org/pandas-docs/stable/timeseries.html), however, I have also listed them below for your convience.

| Value | Description
|---|
|B   |    business day frequency
|C   |    custom business day frequency (experimental)
|D   |    calendar day frequency
|W   |    weekly frequency
|M   |    month end frequency
|BM  |    business month end frequency
|CBM |    custom business month end frequency
|MS  |    month start frequency
|BMS |    business month start frequency
|CBMS|    custom business month start frequency
|Q   |    quarter end frequency
|BQ  |    business quarter endfrequency
|QS  |    quarter start frequency
|BQS |    business quarter start frequency
|A   |    year end frequency
|BA  |    business year end frequency
|AS  |    year start frequency
|BAS |    business year start frequency
|BH  |    business hour frequency
|H   |    hourly frequency
|T   |    minutely frequency
|S   |    secondly frequency
|L   |    milliseonds
|U   |    microseconds
|N   |    nanosecondsa