# Pandas Time Series

Standard included ... not sure I am going to use them all, but here are the basics

In [68]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np

## Data Range
Create a time series consisting of 100 x 1 second samples starting midnight December 1st, 2015. Inspect the first few to make sure...

### 100 seconds

In [4]:
rng = pd.date_range('2015-12-01', periods=100, freq='S')
rng[:5]

DatetimeIndex(['2015-12-01 00:00:00', '2015-12-01 00:00:01',
               '2015-12-01 00:00:02', '2015-12-01 00:00:03',
               '2015-12-01 00:00:04'],
              dtype='datetime64[ns]', freq='S')

### 72 hours 
starting with midnight Jan 1st, 2011

In [3]:
rng = pd.date_range('1/1/2011', periods=72, freq='H')
rng[:5]

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00'],
              dtype='datetime64[ns]', freq='H')

## Series with time series index
Create a Pandas series of random contents with the first date range from above as index and look at the header.

In [5]:
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.head()

2015-12-01 00:00:00    259
2015-12-01 00:00:01    332
2015-12-01 00:00:02     37
2015-12-01 00:00:03    122
2015-12-01 00:00:04    345
Freq: S, dtype: int64

## Resample
### Summarize into 5 minutes samples 

In [12]:
ts.resample('5Min', how='sum')

2015-12-01    24850
Freq: 5T, dtype: int64

There are more information on TimeSeries [here](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#timeseries) and a good crash course [here](http://pandas.pydata.org/pandas-docs/stable/10min.html)

# Other Examples
This dataset was shared by Chris Albon

## Create DataFrame

In [35]:
data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592', '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109', '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'],
        'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41]}
df = pd.DataFrame(data, columns = ['date', 'battle_deaths'])
df

Unnamed: 0,date,battle_deaths
0,2014-05-01 18:47:05.069722,34
1,2014-05-01 18:47:05.119994,25
2,2014-05-02 18:47:05.178768,26
3,2014-05-02 18:47:05.230071,15
4,2014-05-02 18:47:05.230071,15
5,2014-05-02 18:47:05.280592,14
6,2014-05-03 18:47:05.332662,26
7,2014-05-03 18:47:05.385109,25
8,2014-05-04 18:47:05.436523,62
9,2014-05-04 18:47:05.486877,41


Although this is having all the right information from scratch, we still need to get the types right to take advantage of the Pandas Datetime functions. First, let's have a look at the types at hand

In [30]:
df.dtypes

date             object
battle_deaths     int64
dtype: object

### Convert string to datetime
This is very simple to do, if you know the function

In [36]:
df['date'] = pd.to_datetime(df['date'])
df

Unnamed: 0,date,battle_deaths
0,2014-05-01 18:47:05.069722,34
1,2014-05-01 18:47:05.119994,25
2,2014-05-02 18:47:05.178768,26
3,2014-05-02 18:47:05.230071,15
4,2014-05-02 18:47:05.230071,15
5,2014-05-02 18:47:05.280592,14
6,2014-05-03 18:47:05.332662,26
7,2014-05-03 18:47:05.385109,25
8,2014-05-04 18:47:05.436523,62
9,2014-05-04 18:47:05.486877,41


In [32]:
df.dtypes

date             datetime64[ns]
battle_deaths             int64
dtype: object

Notice how the date field type changed from object to datetime64[ns]. This was just what we were looking for.

## Set date series as index
There is a couple of ways to accomplish that. Christ writes in his notes that you should do like this

``
df.index = df['date']
del df['date']
df
``

There is a one liner version of this, that I prefer. The parameter `inplace` means that no copy will be created and the update will happend directly in the dataframe. The `drop` parameter has to be a list of columns (also if there is just one), will delete this after enabling it as an index 

In [42]:
df.set_index(['date'], drop=True, inplace=True)
df

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01 18:47:05.069722,34
2014-05-01 18:47:05.119994,25
2014-05-02 18:47:05.178768,26
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.280592,14
2014-05-03 18:47:05.332662,26
2014-05-03 18:47:05.385109,25
2014-05-04 18:47:05.436523,62
2014-05-04 18:47:05.486877,41


## Lookup for 2014
View all observations that occured in 2014

In [43]:
df['2014']

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01 18:47:05.069722,34
2014-05-01 18:47:05.119994,25
2014-05-02 18:47:05.178768,26
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.280592,14
2014-05-03 18:47:05.332662,26
2014-05-03 18:47:05.385109,25
2014-05-04 18:47:05.436523,62
2014-05-04 18:47:05.486877,41


Time series is good for looking up in the format you see here, for example add month and day like this

In [44]:
df['2014-05-02']

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-02 18:47:05.178768,26
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.280592,14


## Slizing with dates
Observations after May 3rd, 2014

In [46]:
df[pd.datetime(2014, 5, 3):]

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-03 18:47:05.332662,26
2014-05-03 18:47:05.385109,25
2014-05-04 18:47:05.436523,62
2014-05-04 18:47:05.486877,41


Or between two dates, like this for Observations between May 3rd and May 4th. Notice the flexibility in format here...

In [47]:
df['5/3/2014':'5/4/2014']

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-03 18:47:05.332662,26
2014-05-03 18:47:05.385109,25
2014-05-04 18:47:05.436523,62
2014-05-04 18:47:05.486877,41


### Truncation
Almost the same as slizing. For example, cut off everything on and after May 3rd

In [49]:
df.truncate(after='5/3/2014')

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01 18:47:05.069722,34
2014-05-01 18:47:05.119994,25
2014-05-02 18:47:05.178768,26
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.280592,14


Or use `truncate` to create a slice in the middle

In [59]:
df.truncate(before='2014-MAY-2', after='2014-MAY-3')

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-02 18:47:05.178768,26
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.280592,14


## Group and count
Count the number of observations per timestamp like this, or instead of `level=0` you can use `df.index`

In [51]:
df.groupby(level=0).count()

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01 18:47:05.069722,1
2014-05-01 18:47:05.119994,1
2014-05-02 18:47:05.178768,1
2014-05-02 18:47:05.230071,2
2014-05-02 18:47:05.280592,1
2014-05-03 18:47:05.332662,1
2014-05-03 18:47:05.385109,1
2014-05-04 18:47:05.436523,1
2014-05-04 18:47:05.486877,1


## Resample per day
### Count samples

In [55]:
df.resample('D', how='count')

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01,2
2014-05-02,4
2014-05-03,2
2014-05-04,2


### Total values per day

In [57]:
df.resample('D', how='sum')

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01,59
2014-05-02,70
2014-05-03,51
2014-05-04,103


### Mean values per day

In [58]:
df.resample('D', how='mean')

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01,29.5
2014-05-02,17.5
2014-05-03,25.5
2014-05-04,51.5


### Open, High, Low, Close
We need a new dataset to better demonstate this one:

In [66]:
df1 = pd.DataFrame()

df1['ACME'] = np.random.randint(low=20000, high=30000, size=100)
df1['JUBII'] = np.random.randint(low=20000, high=40000, size=100)
df1.index = pd.date_range('1/1/2014', periods=100, freq='H')

In [67]:
df1.resample('D', how='ohlc')

Unnamed: 0_level_0,ACME,ACME,ACME,ACME,JUBII,JUBII,JUBII,JUBII
Unnamed: 0_level_1,open,high,low,close,open,high,low,close
2014-01-01,29022,29794,20008,28731,25011,37067,20093,36896
2014-01-02,21729,29319,20688,27172,34098,38290,21314,24415
2014-01-03,22427,29992,20638,23782,25578,38864,20396,36798
2014-01-04,21649,29806,20311,20982,30814,39255,20216,39255
2014-01-05,20400,26423,20400,20475,33347,33347,20156,31536


# DateOffset

In [60]:
df.index

DatetimeIndex(['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994',
               '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071',
               '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592',
               '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109',
               '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'],
              dtype='datetime64[ns]', name=u'date', freq=None)

In [61]:
df.index + pd.DateOffset(months=2, days=5)

DatetimeIndex(['2014-07-06 18:47:05.069722', '2014-07-06 18:47:05.119994',
               '2014-07-07 18:47:05.178768', '2014-07-07 18:47:05.230071',
               '2014-07-07 18:47:05.230071', '2014-07-07 18:47:05.280592',
               '2014-07-08 18:47:05.332662', '2014-07-08 18:47:05.385109',
               '2014-07-09 18:47:05.436523', '2014-07-09 18:47:05.486877'],
              dtype='datetime64[ns]', name=u'date', freq=None)

# Shifting data columns

In [62]:
df

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01 18:47:05.069722,34
2014-05-01 18:47:05.119994,25
2014-05-02 18:47:05.178768,26
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.230071,15
2014-05-02 18:47:05.280592,14
2014-05-03 18:47:05.332662,26
2014-05-03 18:47:05.385109,25
2014-05-04 18:47:05.436523,62
2014-05-04 18:47:05.486877,41


In [63]:
df.shift(1)

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01 18:47:05.069722,
2014-05-01 18:47:05.119994,34.0
2014-05-02 18:47:05.178768,25.0
2014-05-02 18:47:05.230071,26.0
2014-05-02 18:47:05.230071,15.0
2014-05-02 18:47:05.280592,15.0
2014-05-03 18:47:05.332662,14.0
2014-05-03 18:47:05.385109,26.0
2014-05-04 18:47:05.436523,25.0
2014-05-04 18:47:05.486877,62.0


In [64]:
df.shift(-1)

Unnamed: 0_level_0,battle_deaths
date,Unnamed: 1_level_1
2014-05-01 18:47:05.069722,25.0
2014-05-01 18:47:05.119994,26.0
2014-05-02 18:47:05.178768,15.0
2014-05-02 18:47:05.230071,15.0
2014-05-02 18:47:05.230071,14.0
2014-05-02 18:47:05.280592,26.0
2014-05-03 18:47:05.332662,25.0
2014-05-03 18:47:05.385109,62.0
2014-05-04 18:47:05.436523,41.0
2014-05-04 18:47:05.486877,
