# Time Series
- Basics
- slicing into timezones
- ranges and frequencies
- resampling
- shift and tshift
- interpolation
- moving windows - rolling and expanding
- aggregating data

# Import the libraries

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 
warnings.filterwarnings("ignore", category=FutureWarning) 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# format for floats
pd.options.display.float_format = '{:,.2f}'.format

# Time Series Basics

- Always pay attention to how pandas builds indexes that are timeseries
- Especially true when loading in Data from multiple Data sources
- Once the timeseries is indexed correctly (ascending or descending) accessing rows and columns is fairly flexible
- Special care **MUST** be taken when loading in data from Excel Spreadsheets and CSV files
- Also be careful with date formats
- e.g. 2010-03-01 and 2010-01-03 

In [None]:
df_GOOGL = pd.read_csv(filepath_or_buffer='../Data/GOOGL.csv', index_col='Date', parse_dates=True)

df_GOOGL.tail()


# Date Ranges

Note that sometimes when slicing by date range, you can be caught out by the order of the dates in your index.

i.e. is the first row the earliest date OR the latest date?

It's good practice when dealing with dates as your index, to explicitly sort the index before filtering by a slice of dates. This avoids any surprises.

The slice you filter by must match the sorted order of the index:
- if the index is sorted ascending (earliest date first) then the slice will be: **df['early_date' : 'late_date']**
- if the index is sorted descending (earliest date last) then the slice will be: **df['late_date' : 'early_date']**
- If your index and slice order aren't the same then an empty DataFrame will be returned

In [None]:
from datetime import datetime

# slice between specific dates
df_GOOGL['2010-12-02':'2010-12-25']

# In steps of 30 calendar days
df_GOOGL['2010-12':'2012-1':30]

# between months in steps of 45 days
df_GOOGL['2010-Nov':'2011-MAY':45]

# use variables
start = datetime(2015, 11, 2)
stop = datetime(2015,12,23)

df_GOOGL[start:stop]

# Date Ranges and Frequencies

- Extremely useful in the field of finance
- Convenient syntax
- version 1 - start, stop, frequency
- version 2 - start, frequency, periods

- full list of date frequencies here - http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

### Calendar Quarters

In [None]:
pd.date_range(start='2010-01-01', end='2015-12-31', freq='Q')


### Calendar Quarters beginning in January


In [None]:
pd.date_range(start='2010-01-01', end='2015-12-31', freq='Q-JAN')

### 3rd Friday of Every Month

Note that there are special 'business rules' for some dates

The pandas lbrary designers put anchors etc. into some of their frequency accessors:

http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#anchored-offsets

In [None]:
pd.date_range(start='2010-01-01', end='2015-12-31', freq='WOM-3FRI')


### Date Range in 4 Hour intervals

In [None]:
# There are also some convenient syntax
pd.date_range(start='2010-01-01', end='2010-03-01', freq='4H')


### Date Range in  1 hour and 13 min intervals

In [None]:
pd.date_range(start='2010-01-01', periods=10, freq='1h13min')



### Use a Start, Frequency and periods for other variations

In [None]:
pd.date_range(start='2010-01-01', freq='WOM-3FRI', periods=5)



### Use a date range to lookup/retrieve data from a DataFrame

In [None]:
days_of_month = pd.date_range(start='2010', end='2011', freq='BM')

df_GOOGL.reindex(labels=days_of_month)


# Shifting

Sliding data along a timeseries index

- Shift forward - the most recent are lost - Nan



In [None]:
df_GOOGL.shift(1).head()


In [None]:
df_GOOGL.shift(1).tail()


# Resampling

Resampling is a conversion between frequencies

- **Downsampling** - the easiest - going from a finer grained frequency to a lower grained frequency. e.g. Days to Months, Months to Years
- **Upsampling** - slightly more involved - the reverse, e.g. months to days, days to minutes

Upsampling will require some interpolation

### UpSample - Days in to Years

In [None]:
df_GOOGL.resample(rule='Y').mean()

###  Upsample - all days into year 2010 into months

In [None]:
df_MON = df_GOOGL.loc['2010'].resample(rule = 'M').mean()
df_MON

### Downsample - Months into Weeks

In [None]:
df_MON.resample(rule='W').mean()

### Forward Fill to replace NaN

In [None]:
df_MON.resample(rule='W').ffill()

### Backward fill to replace NaN

In [None]:
df_MON.resample(rule='W').bfill()

### Interpolate to replace NaN

In [None]:
# Interpolate
# default is linear
df_MON.resample(rule='D').interpolate()


# Plot some different interpolations

### Create am Empty DataFrame

In [None]:
df_tmp = pd.DataFrame()


### Create a colum called `Linear`

For linear interpolation

In [None]:
df_tmp['Linear'] = df_MON['Open'].resample(rule='D').interpolate(method='linear')


### Create a colum called `Quadratic`

For quadratic interpolation

In [None]:
df_tmp['Quadratic']  = df_MON['Open'].resample(rule='D').interpolate(method='quadratic')


### Create a colum called `Cubic`

For cubic interpolation

In [None]:
df_tmp['Cubic'] = df_MON['Open'].resample(rule='D').interpolate(method='cubic')


### Plot the Interpolations

In [None]:
df_tmp.plot()

# Moving Windows

- `rolling()` - create a window and slide along, returning a Series as you go
- `expanding()` - gradually increase the size of your window

### Plot Moviong Averages

`Adj Close`

42 day moving average of `Adj Close`

252 day moving average of `Adj Close`

In [None]:
df_GOOGL['Adj Close'].plot()
df_GOOGL['Adj Close'].rolling(window=42).mean().plot()
df_GOOGL['Adj Close'].rolling(window=252).mean().plot()

### Plot Expanding WIndows

`Adj Close`

Expanding `Adj Close`


In [None]:
df_GOOGL['Adj Close'].plot()
df_GOOGL['Adj Close'].expanding().mean().plot()


## Aggregating Data

- Quite often you will want to resample and apply a function to the aggregate
- You have already done this, e.g. **df.resample(rule='BQ').mean()**
- A more convenient way is to use the `agg()` method and supply it with the name of the function you want to apply to your aggregate


### Calculate mean aggregated by year - Option 1 use `mean`

In [None]:
df_GOOGL.resample(rule='Y').mean()

### Calculate mean aggregated by year - Option 2 use `agg`

In [None]:
df_GOOGL.resample(rule='Y').agg('mean') 

### Use a variable to store the name of the function

In [None]:
func = 'mean'
df_GOOGL.resample(rule='Y').agg(func)

### Aggregate by `mean`, `max` and `min`

In [None]:
funcs = ['mean', 'max', 'min']
df_GOOGL.resample(rule='Y').agg(funcs)

### More sophisticated aggregations

Functions
- 'mean', 'max', 'min'


Columns
- 'High', 'Low'


Date Range
- 2016 to 2017


Period
- Business Quarter

In [None]:
funcs = ['mean', 'max', 'min']
cols = ['High', 'Low']
from_date = '2016'
to_date = '2017'
freq = 'BQ'

# And now only for 2016 to 2017 but for Business Quarter
df_GOOGL[from_date:to_date][cols].resample(rule=freq).agg(funcs)


### Same as above but transpose results

In [None]:
df_GOOGL[from_date:to_date][cols].resample(rule=freq).agg(funcs).transpose()
