# Time Series with Python

Based on Aileen Nielsen's presentation at SciPy 2016. The assignment is spread over the included 9 IPyNotebooks, in numbered order.

In [1]:
import pandas as pd
import numpy as np

In [2]:
# TIMES
rng = pd.date_range('1 Jul 2016', periods = 10, freq = '8H')
rng

DatetimeIndex(['2016-07-01 00:00:00', '2016-07-01 08:00:00',
               '2016-07-01 16:00:00', '2016-07-02 00:00:00',
               '2016-07-02 08:00:00', '2016-07-02 16:00:00',
               '2016-07-03 00:00:00', '2016-07-03 08:00:00',
               '2016-07-03 16:00:00', '2016-07-04 00:00:00'],
              dtype='datetime64[ns]', freq='8H')

Which of these formats DON'T work?

'2016 Jul 1', '7/1/2016', '1/7/2016', 'July 1, 2016', '2016-07-01', '2016/07/01'

_Reply:_ '1/7/2016'

Is '7/1/2016 in January or July?

_Reply:_ In standard parlance in the US, this would be July and pandas defaults the United States standard.

What is the class of an individual object held in the date_range?

_Reply:_ They are stored as a `Timestamp` object

## Time Stamps v. Time Spans

### Timestamps

In [3]:
# TIME STAMPS VS TIME SPANS
pd.Timestamp('2016-07-10')

Timestamp('2016-07-10 00:00:00')

In [4]:
# You can also more details 
pd.Timestamp('2016-07-10 10')

Timestamp('2016-07-10 10:00:00')

In [5]:
# Or even more...
pd.Timestamp('2016-07-10 10:15')

Timestamp('2016-07-10 10:15:00')

How much detail can you add?

_Reply_: Pandas allows for times down to nanoseconds.

What are some properties of timestamps? Try them out.

_Reply_: There is a wide range. Listed here: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-date-components

In [6]:
# What are some properties of timestamps? Try them out.
# hint: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#time-date-components
t = pd.Timestamp('2016-07-10 10:15')
print(t.dayofweek)
print(t.dayofyear)
print(t.weekday_name)

6
192
Sunday


### Timespans

In [7]:
# TIME SPANS
pd.Period('2016-01')

Period('2016-01', 'M')

What's that extra info above? How does it get set?

_Reply:_ This indicates that time frame covers the whole month. This is set by using the `Period` object instead of the `Timestamp` object. The month is set based on the amount of detail provided.

In [8]:
pd.Period('2016-01-01')

Period('2016-01-01', 'D')

In [9]:
pd.Period('2016-01-01 10')

Period('2016-01-01 10:00', 'H')

In [10]:
pd.Period('2016-01-01 10:10')

Period('2016-01-01 10:10', 'T')

In [11]:
pd.Period('2016-01-01 10:10:10')

Period('2016-01-01 10:10:10', 'S')

What's the most detailed Period you can get?

_Reply:_ Like the `Timestamp` object, this should be nanoseconds.

How can you make multiple time periods? 
Hint look for analogy with `pd.date_range()` above

In [12]:
# using the code provided by the documentation:
print(pd.period_range('1/1/2011', '1/1/2012', freq='M'))
t in pd.period_range('1/1/2011 8:00', '1/1/2012 8:00', freq ='8H')

PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
             '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
             '2012-01'],
            dtype='period[M]', freq='M')


False

## Time Offsets

In [13]:
# TIME OFFSETS
pd.Timedelta('1 day')

Timedelta('1 days 00:00:00')

In [14]:
pd.Period('2016-01-01 10:10') + pd.Timedelta('1 day')

Period('2016-01-02 10:10', 'T')

In [15]:
pd.Timestamp('2016-01-01 10:10') + pd.Timedelta('1 day')

Timestamp('2016-01-02 10:10:00')

In [16]:
pd.Timestamp('2016-01-01 10:10') + pd.Timedelta('15 ns')

Timestamp('2016-01-01 10:10:00.000000015')

## Fancy Frequency Setting

In [17]:
# FANCY FREQUENCY SETTING
# Only want business days
pd.period_range('2016-01-01 10:10', freq = 'B', periods = 10)

PeriodIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
             '2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
             '2016-01-13', '2016-01-14'],
            dtype='period[B]', freq='B')

In [18]:
# It's possible to combine frequencies. What if you want to advance by 25 hours each day. What are the 2 ways to do it?
p1 = pd.period_range('2016-01-01 10:10', freq = '25H', periods = 10)

In [19]:
p2 = pd.period_range('2016-01-01 10:10', freq = '1D1H', periods = 10)

In [20]:
p1

PeriodIndex(['2016-01-01 10:00', '2016-01-02 11:00', '2016-01-03 12:00',
             '2016-01-04 13:00', '2016-01-05 14:00', '2016-01-06 15:00',
             '2016-01-07 16:00', '2016-01-08 17:00', '2016-01-09 18:00',
             '2016-01-10 19:00'],
            dtype='period[25H]', freq='25H')

In [21]:
p2

PeriodIndex(['2016-01-01 10:00', '2016-01-02 11:00', '2016-01-03 12:00',
             '2016-01-04 13:00', '2016-01-05 14:00', '2016-01-06 15:00',
             '2016-01-07 16:00', '2016-01-08 17:00', '2016-01-09 18:00',
             '2016-01-10 19:00'],
            dtype='period[25H]', freq='25H')

What are some other combos you can produce?
hint: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

In [22]:
# work weeks
p3 = pd.period_range('2016-01-01 10:10', freq = '5B8H', periods = 10)
p3

PeriodIndex(['2016-01-01', '2016-01-08', '2016-01-15', '2016-01-22',
             '2016-01-29', '2016-02-05', '2016-02-12', '2016-02-19',
             '2016-02-26', '2016-03-04'],
            dtype='period[5B+8H]', freq='5B+8H')

## Indexing with Time Objects

In [23]:
# INDEXING WITH TIME OBJECTS
# You can use these objects for indices
# Let's start with using a date range as above
rng = pd.date_range('2016 Jul 1', periods = 10, freq = 'D')
print(rng)
pd.Series(range(len(rng)), index = rng)

DatetimeIndex(['2016-07-01', '2016-07-02', '2016-07-03', '2016-07-04',
               '2016-07-05', '2016-07-06', '2016-07-07', '2016-07-08',
               '2016-07-09', '2016-07-10'],
              dtype='datetime64[ns]', freq='D')


2016-07-01    0
2016-07-02    1
2016-07-03    2
2016-07-04    3
2016-07-05    4
2016-07-06    5
2016-07-07    6
2016-07-08    7
2016-07-09    8
2016-07-10    9
Freq: D, dtype: int32

In [24]:
# You can also use time period indices, in cases where it makes more sense 
# to think about your index as a time span rather than a single point in time

periods = [pd.Period('2016-01'), pd.Period('2016-02'), pd.Period('2016-03')]
ts = pd.Series(np.random.randn(len(periods)), index = periods)
ts

2016-01    0.494303
2016-02    1.156688
2016-03   -1.092093
Freq: M, dtype: float64

In [25]:
# What type is the index for ts?
type(ts.index)

pandas.core.indexes.period.PeriodIndex

In [26]:
# Experiment with various indices
# Hint: does ts['2016'] work? 

print(ts['2016'])
print(ts['2016-02'])

2016-01    0.494303
2016-02    1.156688
2016-03   -1.092093
Freq: M, dtype: float64
1.15668837674


In [27]:
# Timestamped data can be convereted to period indices with to_period and vice versa with to_timestamp
ts = pd.Series(range(10), pd.date_range('07-10-16 8:15', periods = 10, freq = 'H'))
ts

2016-07-10 08:15:00    0
2016-07-10 09:15:00    1
2016-07-10 10:15:00    2
2016-07-10 11:15:00    3
2016-07-10 12:15:00    4
2016-07-10 13:15:00    5
2016-07-10 14:15:00    6
2016-07-10 15:15:00    7
2016-07-10 16:15:00    8
2016-07-10 17:15:00    9
Freq: H, dtype: int32

In [28]:
ts_period = ts.to_period(freq = '60T')
ts_period

2016-07-10 08:15    0
2016-07-10 09:15    1
2016-07-10 10:15    2
2016-07-10 11:15    3
2016-07-10 12:15    4
2016-07-10 13:15    5
2016-07-10 14:15    6
2016-07-10 15:15    7
2016-07-10 16:15    8
2016-07-10 17:15    9
Freq: 60T, dtype: int32

In [29]:
ts_period['2016-07-10 08:30':'2016-07-10 11:45'] # we have the concept of overlap with time periods

2016-07-10 09:15    1
2016-07-10 10:15    2
2016-07-10 11:15    3
Freq: 60T, dtype: int32

In [30]:
ts['2016-07-10 08:30':'2016-07-10 11:45'] # we have the concept of include with timestamps

2016-07-10 09:15:00    1
2016-07-10 10:15:00    2
2016-07-10 11:15:00    3
Freq: H, dtype: int32