<div style="color:#006666; padding:0px 10px; border-radius:5px; font-size:18px;"><h1 style='margin:10px 5px'>Time Series Methods</h1>
</div>

© Copyright Machine Learning Plus

 <div class="alert alert-info" style="background-color:#006666; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:10px 5px'>1. Datetime Refresher</h2>
</div>

The `datetime` library is the standard python library for handling date and time related data in Python. 

The `datetime` module within `datetime` package can be used to create the `datetime` objects.

In [None]:
from datetime import datetime

In [None]:
# Argument for datetime
Y = 1998
m = 11
d = 2
H = 14
M = 10
S = 12

In [None]:
# November 2nd, 1998
date = datetime(Y, m, d)
date

datetime.datetime(1998, 11, 2, 0, 0)

__Pass all of it in order__

In [None]:
# November 2nd, 1998 at 14:10:12
dt = datetime(Y, m, d, H, M, S)

In [None]:
dt

datetime.datetime(1998, 11, 2, 14, 10, 12)

You can extract any part of the datetime easily.

In [None]:
# day of month
dt.day

2

In [None]:
# hour
dt.hour

14

In [None]:
dt.minute

10

In [None]:
dt.month

11

In [None]:
dt.weekday()

0

### Parse as date

Usually time data is not provided as raw numbers. It's usually in a human readable format and the software need to be able to parse it. 

Two popular options exist:
1. `parse` from `dateutil.parser`
2. `pandas.to_datetime`

In [None]:
from dateutil.parser import parse
timetext = 'January 31, 2010'
parse(timetext)

datetime.datetime(2010, 1, 31, 0, 0)

In [None]:
import pandas as pd
ts = pd.to_datetime(timetext)
ts

Timestamp('2010-01-31 00:00:00')

Convert the pandas `timestamp` to `datetime`

In [None]:
ts.to_pydatetime()

datetime.datetime(2010, 1, 31, 0, 0)

__Now how to represent in a human readable form? --> Use `strftime()`__

Each component of the datetime has its own [human readable abbreviation](http://strftime.org/).

In [None]:
print(dt.strftime('%Y-%m-%d::%H-%M'))

1998-11-02::14-10


Another way.

In [None]:
print(dt.strftime('%d %B, %Y %A'))

02 November, 1998 Monday


If you subtract two datetime objects, you will get a timedelta object.

In [None]:
dt1 = datetime(2002, 1, 31, 10, 10, 0)
dt2 = datetime(2001, 1, 31, 10, 10, 0)
td = dt1 - dt2
td

datetime.timedelta(days=365)

In [None]:
td.days

365

In [None]:
td.total_seconds()

31536000.0

### Challenge

1. Parse the following strings to datetime.

```
s1 = "2010 Jan 1"
s2 = '31-1-2000' 
s3 = 'October10, 1996, 10:40pm'
```

2. How many days has it been between the end of first world war to the beginning of second world war?

    _"11 November 1918"_ to _"1 september 1939"_

In [None]:
https://git.io/JZI3f

In [None]:
# Solution 1
s1 = "2010 Jan 1"
s2 = '31-1-2000' 
s3 = 'October10, 1996, 10:40pm'

from dateutil.parser import parse
print(parse(s1))
print(parse(s2))
print(parse(s3))

In [None]:
import pandas as pd
pd.to_datetime(s1).to_pydatetime()

__Solution 2__

In [None]:
dt1 = pd.to_datetime("11 November 1918")
dt2 = pd.to_datetime("1 September 1939")
td = dt2 - dt1
td.days

 <div class="alert alert-info" style="background-color:#006666; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:10px 5px'>2. Time Series Data</h2>
</div>

Any data captured at regular time intervals is a Time Series. In Pandas, it is a series where the index is dates.

Example:
- Stock Prices
- Sales data
- Weather

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime

__Time Series in one where the index is made of datetime objects.__

In [None]:
dates = [datetime(2001, 1, 1), 
         datetime(2001, 1, 4), 
         datetime(2001, 1, 6), 
         datetime(2001, 1, 9), 
         datetime(2001, 1, 10), 
         datetime(2001, 1, 12)]

np.random.seed(10)
ts = pd.Series(np.random.randn(6), index=dates)
ts

2001-01-01    1.331587
2001-01-04    0.715279
2001-01-06   -1.545400
2001-01-09   -0.008384
2001-01-10    0.621336
2001-01-12   -0.720086
dtype: float64

In [None]:
ts.index

DatetimeIndex(['2001-01-01', '2001-01-04', '2001-01-06', '2001-01-09',
               '2001-01-10', '2001-01-12'],
              dtype='datetime64[ns]', freq=None)

__You can create a series of dates using `pd.date_range()`__

In [None]:
pd.date_range('2001-01-01', '2001-01-10')

DatetimeIndex(['2001-01-01', '2001-01-02', '2001-01-03', '2001-01-04',
               '2001-01-05', '2001-01-06', '2001-01-07', '2001-01-08',
               '2001-01-09', '2001-01-10'],
              dtype='datetime64[ns]', freq='D')

Alternate days

In [None]:
pd.date_range('2001-01-01', '2001-01-10', freq='2d')

DatetimeIndex(['2001-01-01', '2001-01-03', '2001-01-05', '2001-01-07',
               '2001-01-09'],
              dtype='datetime64[ns]', freq='2D')

Every Week, starting monday --> 'W-MON'

In [None]:
pd.date_range('2001-01-01', periods=10, freq='W-MON')

DatetimeIndex(['2001-01-01', '2001-01-08', '2001-01-15', '2001-01-22',
               '2001-01-29', '2001-02-05', '2001-02-12', '2001-02-19',
               '2001-02-26', '2001-03-05'],
              dtype='datetime64[ns]', freq='W-MON')

Hourly

In [None]:
pd.date_range('2001-01-01', periods=10, freq='H')

DatetimeIndex(['2001-01-01 00:00:00', '2001-01-01 01:00:00',
               '2001-01-01 02:00:00', '2001-01-01 03:00:00',
               '2001-01-01 04:00:00', '2001-01-01 05:00:00',
               '2001-01-01 06:00:00', '2001-01-01 07:00:00',
               '2001-01-01 08:00:00', '2001-01-01 09:00:00'],
              dtype='datetime64[ns]', freq='H')

Every 2.5 hours

In [None]:
pd.date_range('2001-01-01', periods=10, freq='2.5H')

DatetimeIndex(['2001-01-01 00:00:00', '2001-01-01 02:30:00',
               '2001-01-01 05:00:00', '2001-01-01 07:30:00',
               '2001-01-01 10:00:00', '2001-01-01 12:30:00',
               '2001-01-01 15:00:00', '2001-01-01 17:30:00',
               '2001-01-01 20:00:00', '2001-01-01 22:30:00'],
              dtype='datetime64[ns]', freq='150T')

So its a powerful way to generate whatever frequency or range you want to create.

Let's create a long series and see how to subset date ranges.

In [None]:
ts = pd.Series(np.random.rand(500), index=pd.date_range('2001-01-01', periods=500))
ts

2001-01-01    0.169111
2001-01-02    0.088340
2001-01-03    0.685360
2001-01-04    0.953393
2001-01-05    0.003948
                ...   
2002-05-11    0.061431
2002-05-12    0.598174
2002-05-13    0.885920
2002-05-14    0.412134
2002-05-15    0.038272
Freq: D, Length: 500, dtype: float64

Select all datapoints in year 2001

In [None]:
ts['2001']

2001-01-01    0.169111
2001-01-02    0.088340
2001-01-03    0.685360
2001-01-04    0.953393
2001-01-05    0.003948
                ...   
2001-12-27    0.537832
2001-12-28    0.754724
2001-12-29    0.272526
2001-12-30    0.566517
2001-12-31    0.476685
Freq: D, Length: 365, dtype: float64

Select all 2001-jan

In [None]:
ts['2001-01']

2001-01-01    0.169111
2001-01-02    0.088340
2001-01-03    0.685360
2001-01-04    0.953393
2001-01-05    0.003948
2001-01-06    0.512192
2001-01-07    0.812621
2001-01-08    0.612526
2001-01-09    0.721755
2001-01-10    0.291876
2001-01-11    0.917774
2001-01-12    0.714576
2001-01-13    0.542544
2001-01-14    0.142170
2001-01-15    0.373341
2001-01-16    0.674134
2001-01-17    0.441833
2001-01-18    0.434014
2001-01-19    0.617767
2001-01-20    0.513138
2001-01-21    0.650397
2001-01-22    0.601039
2001-01-23    0.805223
2001-01-24    0.521647
2001-01-25    0.908649
2001-01-26    0.319236
2001-01-27    0.090459
2001-01-28    0.300700
2001-01-29    0.113984
2001-01-30    0.828681
2001-01-31    0.046896
Freq: D, dtype: float64

__Select all records between two dates, say, Jan 01 to Jan 15th__

In [None]:
ts['2001-01-01':'2001-01-14']

2001-01-01    0.169111
2001-01-02    0.088340
2001-01-03    0.685360
2001-01-04    0.953393
2001-01-05    0.003948
2001-01-06    0.512192
2001-01-07    0.812621
2001-01-08    0.612526
2001-01-09    0.721755
2001-01-10    0.291876
2001-01-11    0.917774
2001-01-12    0.714576
2001-01-13    0.542544
2001-01-14    0.142170
Freq: D, dtype: float64

 <div class="alert alert-info" style="background-color:#006666; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:10px 5px'>3. Resampling and Shifting</h2>
</div>

In [None]:
import pandas as pd
import numpy as np

Let's create a series with non-continuous dates. Then fill it up.

In [None]:
ts = pd.Series(np.random.rand(20), pd.date_range('2001-01-01', periods=20, freq='2d'))
ts

2001-01-01    0.080842
2001-01-03    0.081915
2001-01-05    0.809367
2001-01-07    0.800694
2001-01-09    0.016910
2001-01-11    0.861294
2001-01-13    0.911300
2001-01-15    0.400877
2001-01-17    0.286891
2001-01-19    0.843045
2001-01-21    0.617045
2001-01-23    0.140599
2001-01-25    0.228756
2001-01-27    0.841498
2001-01-29    0.844358
2001-01-31    0.220146
2001-02-02    0.811138
2001-02-04    0.535954
2001-02-06    0.610428
2001-02-08    0.523870
Freq: 2D, dtype: float64

__Common method to fill the gaps is to forward (or backward) fill it.__

In [None]:
ts.resample('D').ffill()

2001-01-01    0.080842
2001-01-02    0.080842
2001-01-03    0.081915
2001-01-04    0.081915
2001-01-05    0.809367
2001-01-06    0.809367
2001-01-07    0.800694
2001-01-08    0.800694
2001-01-09    0.016910
2001-01-10    0.016910
2001-01-11    0.861294
2001-01-12    0.861294
2001-01-13    0.911300
2001-01-14    0.911300
2001-01-15    0.400877
2001-01-16    0.400877
2001-01-17    0.286891
2001-01-18    0.286891
2001-01-19    0.843045
2001-01-20    0.843045
2001-01-21    0.617045
2001-01-22    0.617045
2001-01-23    0.140599
2001-01-24    0.140599
2001-01-25    0.228756
2001-01-26    0.228756
2001-01-27    0.841498
2001-01-28    0.841498
2001-01-29    0.844358
2001-01-30    0.844358
2001-01-31    0.220146
2001-02-01    0.220146
2001-02-02    0.811138
2001-02-03    0.811138
2001-02-04    0.535954
2001-02-05    0.535954
2001-02-06    0.610428
2001-02-07    0.610428
2001-02-08    0.523870
Freq: D, dtype: float64

You can aggregate it to a weekly level as well, taking the mean.

In [None]:
ts.resample('W').mean()

2001-01-07    0.443205
2001-01-14    0.596501
2001-01-21    0.536964
2001-01-28    0.403617
2001-02-04    0.602899
2001-02-11    0.567149
Freq: W-SUN, dtype: float64

![image.png](attachment:image.png)

In [None]:
# start of month
ts.resample('MS').mean()

2001-01-01    0.499096
2001-02-01    0.620347
Freq: MS, dtype: float64

In [None]:
# Business month end
ts.resample('BM').mean()

2001-01-31    0.499096
2001-02-28    0.620347
Freq: BM, dtype: float64

### Shifting

You can create lags or leads of a time series using the `shift` method. Lag means, the series values are trailing with time. Lead means, the value of the new series is ahead of time. 

A lag will have missing values in the beginning.

In [None]:
ts

2001-01-01    0.080842
2001-01-03    0.081915
2001-01-05    0.809367
2001-01-07    0.800694
2001-01-09    0.016910
2001-01-11    0.861294
2001-01-13    0.911300
2001-01-15    0.400877
2001-01-17    0.286891
2001-01-19    0.843045
2001-01-21    0.617045
2001-01-23    0.140599
2001-01-25    0.228756
2001-01-27    0.841498
2001-01-29    0.844358
2001-01-31    0.220146
2001-02-02    0.811138
2001-02-04    0.535954
2001-02-06    0.610428
2001-02-08    0.523870
Freq: 2D, dtype: float64

In [None]:
# lag 2
lag_2 = ts.shift(2)
lag_2

2001-01-01         NaN
2001-01-03         NaN
2001-01-05    0.080842
2001-01-07    0.081915
2001-01-09    0.809367
2001-01-11    0.800694
2001-01-13    0.016910
2001-01-15    0.861294
2001-01-17    0.911300
2001-01-19    0.400877
2001-01-21    0.286891
2001-01-23    0.843045
2001-01-25    0.617045
2001-01-27    0.140599
2001-01-29    0.228756
2001-01-31    0.841498
2001-02-02    0.844358
2001-02-04    0.220146
2001-02-06    0.811138
2001-02-08    0.535954
Freq: 2D, dtype: float64

A lead will have missing values in the end.

In [None]:
# lead 2
lead_2 = ts.shift(-2)
lead_2

2001-01-01    0.809367
2001-01-03    0.800694
2001-01-05    0.016910
2001-01-07    0.861294
2001-01-09    0.911300
2001-01-11    0.400877
2001-01-13    0.286891
2001-01-15    0.843045
2001-01-17    0.617045
2001-01-19    0.140599
2001-01-21    0.228756
2001-01-23    0.841498
2001-01-25    0.844358
2001-01-27    0.220146
2001-01-29    0.811138
2001-01-31    0.535954
2001-02-02    0.610428
2001-02-04    0.523870
2001-02-06         NaN
2001-02-08         NaN
Freq: 2D, dtype: float64

### Mini Challenge

Create a dataframe that contains `ts` the first 6 lags of the series `ts` as columns. The date should be the index and no missing values be present.

__Code URL:__ https://git.io/JZIrW

In [None]:
# input
import pandas as pd
import numpy as np

ts = pd.Series(np.arange(20), pd.date_range('2001-01-01', periods=20, freq='2d'), name="values")
ts

2001-01-01     0
2001-01-03     1
2001-01-05     2
2001-01-07     3
2001-01-09     4
2001-01-11     5
2001-01-13     6
2001-01-15     7
2001-01-17     8
2001-01-19     9
2001-01-21    10
2001-01-23    11
2001-01-25    12
2001-01-27    13
2001-01-29    14
2001-01-31    15
2001-02-02    16
2001-02-04    17
2001-02-06    18
2001-02-08    19
Freq: 2D, Name: values, dtype: int32

__Solution__

In [None]:
# Solution
ts = pd.Series(np.arange(20), pd.date_range('2001-01-01', periods=20, freq='2d'), name="values")

df = ts.reset_index()
df['lag1'] = df['values'].shift(1)
df['lag2'] = df['values'].shift(2)
df['lag3'] = df['values'].shift(3)
df['lag4'] = df['values'].shift(4)
df['lag5'] = df['values'].shift(5)
df['lag6'] = df['values'].shift(6)


# Drop na and set index to dates
df.dropna(inplace=True)
df.set_index('index', inplace=True)
df

Unnamed: 0_level_0,values,lag1,lag2,lag3,lag4,lag5,lag6
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2001-01-13,6,5.0,4.0,3.0,2.0,1.0,0.0
2001-01-15,7,6.0,5.0,4.0,3.0,2.0,1.0
2001-01-17,8,7.0,6.0,5.0,4.0,3.0,2.0
2001-01-19,9,8.0,7.0,6.0,5.0,4.0,3.0
2001-01-21,10,9.0,8.0,7.0,6.0,5.0,4.0
2001-01-23,11,10.0,9.0,8.0,7.0,6.0,5.0
2001-01-25,12,11.0,10.0,9.0,8.0,7.0,6.0
2001-01-27,13,12.0,11.0,10.0,9.0,8.0,7.0
2001-01-29,14,13.0,12.0,11.0,10.0,9.0,8.0
2001-01-31,15,14.0,13.0,12.0,11.0,10.0,9.0


 <div class="alert alert-info" style="background-color:#006666; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:10px 5px'>4. Time Methods</h2>
</div>

Once you've creates a time series, you can put them together to form a dataframe. Then, access the date related methods from the dataframe using the `dt` accessor method.

__Put them in a dataframe__

In [None]:
df = pd.DataFrame({'ts': ts, 'lag2': lag_2, 'lead2': lead_2})
df

Unnamed: 0,ts,lag2,lead2
2001-01-01,0.080842,,0.809367
2001-01-03,0.081915,,0.800694
2001-01-05,0.809367,0.080842,0.01691
2001-01-07,0.800694,0.081915,0.861294
2001-01-09,0.01691,0.809367,0.9113
2001-01-11,0.861294,0.800694,0.400877
2001-01-13,0.9113,0.01691,0.286891
2001-01-15,0.400877,0.861294,0.843045
2001-01-17,0.286891,0.9113,0.617045
2001-01-19,0.843045,0.400877,0.140599


To access the methods of a date, use the `.dt` accessor methods on the date series. 

So, first convert the index to a column. Then use `df['column'].dt`

In [None]:
df = df.reset_index()
df

Unnamed: 0,index,ts,lag2,lead2
0,2001-01-01,0.080842,,0.809367
1,2001-01-03,0.081915,,0.800694
2,2001-01-05,0.809367,0.080842,0.01691
3,2001-01-07,0.800694,0.081915,0.861294
4,2001-01-09,0.01691,0.809367,0.9113
5,2001-01-11,0.861294,0.800694,0.400877
6,2001-01-13,0.9113,0.01691,0.286891
7,2001-01-15,0.400877,0.861294,0.843045
8,2001-01-17,0.286891,0.9113,0.617045
9,2001-01-19,0.843045,0.400877,0.140599


__day of month__

In [None]:
df['index'].dt.day

0      1
1      3
2      5
3      7
4      9
5     11
6     13
7     15
8     17
9     19
10    21
11    23
12    25
13    27
14    29
15    31
16     2
17     4
18     6
19     8
Name: index, dtype: int64

__week of the year__

In [None]:
df['index'].dt.isocalendar().week

0     1
1     1
2     1
3     1
4     2
5     2
6     2
7     3
8     3
9     3
10    3
11    4
12    4
13    4
14    5
15    5
16    5
17    5
18    6
19    6
Name: week, dtype: UInt32

In [None]:
df['index'].dt.isocalendar().year

0     2001
1     2001
2     2001
3     2001
4     2001
5     2001
6     2001
7     2001
8     2001
9     2001
10    2001
11    2001
12    2001
13    2001
14    2001
15    2001
16    2001
17    2001
18    2001
19    2001
Name: year, dtype: UInt32

In [None]:
df['index'].dt.is_leap_year

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
Name: index, dtype: bool

In [None]:
print(dir(df['index'].dt))

['__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__frozen', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_accessors', '_add_delegate_accessors', '_constructor', '_delegate_method', '_delegate_property_get', '_delegate_property_set', '_dir_additions', '_dir_deletions', '_freeze', '_get_values', '_hidden_attrs', '_parent', '_reset_cache', 'ceil', 'date', 'day', 'day_name', 'day_of_week', 'day_of_year', 'dayofweek', 'dayofyear', 'days_in_month', 'daysinmonth', 'floor', 'freq', 'hour', 'is_leap_year', 'is_month_end', 'is_month_start', 'is_quarter_end', 'is_quarter_start', 'is_year_end', 'is_year_start', 'isocalendar', 'microsecond', 'minute', 'month', 'month_name', 'nanosecond', 'normalize', 'quarter', 'round', 'second', '