**Group Data by Time**

https://chrisalbon.com/python/data_wrangling/pandas_group_data_by_time/

On March 13, 2016, version 0.18.0 of Pandas was released, with significant changes in how the resampling function operates. This tutorial follows v0.18.0 and will not work for previous versions of pandas.

First let’s load the modules we care about

In [1]:
import pandas as pd
import numpy as np
import datetime

**Next, let’s create some sample data that we can group by time as an sample. In this example I am creating a dataframe with two columns with 365 rows. One column is a date, the second column is a numeric value.**

In [2]:
#Create a datetime variable for today

base = datetime.datetime.today()

#Create a list variable that creates 365 rows of days of datetime values

date_list = [base - datetime.timedelta(x) for x in range(0,365)]

In [3]:
#Create a list variable of 365 numeric values

score_list = list(np.random.randint(low =1, high=1000, size = 365))

In [4]:
#Create an empty dataframe

df = pd.DataFrame()

# Create a column from the datetime variable
df['datetime'] = date_list
# Convert that column into a datetime datatype
df['datetime'] = pd.to_datetime(df['datetime'])
# Set the datetime column as the index
df.index = df['datetime'] 
# Create a column from the numeric score variable
df['score'] = score_list

#View the data
df

Unnamed: 0_level_0,datetime,score
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-01-30 15:14:40.766823,2018-01-30 15:14:40.766823,69
2018-01-29 15:14:40.766823,2018-01-29 15:14:40.766823,610
2018-01-28 15:14:40.766823,2018-01-28 15:14:40.766823,443
2018-01-27 15:14:40.766823,2018-01-27 15:14:40.766823,433
2018-01-26 15:14:40.766823,2018-01-26 15:14:40.766823,883
2018-01-25 15:14:40.766823,2018-01-25 15:14:40.766823,441
2018-01-24 15:14:40.766823,2018-01-24 15:14:40.766823,338
2018-01-23 15:14:40.766823,2018-01-23 15:14:40.766823,306
2018-01-22 15:14:40.766823,2018-01-22 15:14:40.766823,935
2018-01-21 15:14:40.766823,2018-01-21 15:14:40.766823,123


In [5]:
df.dtypes

datetime    datetime64[ns]
score                int64
dtype: object

**Group Data by Date**

In pandas, the most common way to group by time is to use the .resample() function. In v0.18.0 this function is two-stage. This means that ‘df.resample(’M’)’ creates an object to which we can apply other functions (‘mean’, ‘count’, ‘sum’, etc.)

In [7]:
# Group the data by month, and take the mean for each group (i.e. each month)
df.resample('M').mean()

Unnamed: 0_level_0,score
datetime,Unnamed: 1_level_1
2017-01-31,668.0
2017-02-28,497.035714
2017-03-31,555.870968
2017-04-30,565.133333
2017-05-31,431.967742
2017-06-30,479.0
2017-07-31,485.322581
2017-08-31,480.967742
2017-09-30,511.666667
2017-10-31,448.225806


In [8]:
# Group the data by month, and take the sum for each group (i.e. each month)
df.resample('M').sum()

Unnamed: 0_level_0,score
datetime,Unnamed: 1_level_1
2017-01-31,668
2017-02-28,13917
2017-03-31,17232
2017-04-30,16954
2017-05-31,13391
2017-06-30,14370
2017-07-31,15045
2017-08-31,14910
2017-09-30,15350
2017-10-31,13895


Grouping Options
There are many options for grouping. You can learn more about them in Pandas’s timeseries docs, however, I have also listed them below for your convience.

| Value | Description |—| |B | business day frequency |C | custom business day frequency (experimental) |D | calendar day frequency |W | weekly frequency |M | month end frequency |BM | business month end frequency |CBM | custom business month end frequency |MS | month start frequency |BMS | business month start frequency |CBMS| custom business month start frequency |Q | quarter end frequency |BQ | business quarter endfrequency |QS | quarter start frequency |BQS | business quarter start frequency |A | year end frequency |BA | business year end frequency |AS | year start frequency |BAS | business year start frequency |BH | business hour frequency |H | hourly frequency |T | minutely frequency |S | secondly frequency |L | milliseonds |U | microseconds |N | nanoseconds