# Working with Time Series

## Dates and Times in Python


### Native Python dates and times: ``datetime`` and ``dateutil``

In [1]:
from datetime import datetime

In [2]:
datetime(2020,11,11)

datetime.datetime(2020, 11, 11, 0, 0)

In [4]:
from dateutil import parser

In [6]:
date = parser.parse("2020-11-11")

In [7]:
date

datetime.datetime(2020, 11, 11, 0, 0)

In [8]:
date.strftime('%A')

'Wednesday'

In [9]:
date.strftime('%a')

'Wed'

In [10]:
date.weekday()

2

[strftime section](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) 

[datetime documentation](https://docs.python.org/3/library/datetime.html)

[dateutil's online documentation](http://labix.org/python-dateutil)


### Typed arrays of times: NumPy's ``datetime64``

기존의 파이썬 datetime 을 보강하기 위해,  date 의 array 도 처리할 수 있게 numpy 에서 64-bit 로 처리하도록 라이브러리를 강화했음.

In [11]:
import numpy as np

In [12]:
date = np.array('2020-11-11', dtype = np.datetime64)

In [13]:
date

array('2020-11-11', dtype='datetime64[D]')

In [14]:
date + np.arange(11+1)

array(['2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14',
       '2020-11-15', '2020-11-16', '2020-11-17', '2020-11-18',
       '2020-11-19', '2020-11-20', '2020-11-21', '2020-11-22'],
      dtype='datetime64[D]')

In [15]:
np.datetime64('2020-10-22')

numpy.datetime64('2020-10-22')

In [16]:
np.datetime64('2020-10-22 13:23')

numpy.datetime64('2020-10-22T13:23')

In [18]:
np.datetime64('2020-10-22 13:23:58.500', 'ns')

numpy.datetime64('2020-10-22T13:23:58.500000000')

|Code    | Meaning     | Time span (relative) | Time span (absolute)   |
|--------|-------------|----------------------|------------------------|
| ``Y``  | Year	       | ± 9.2e18 years       | [9.2e18 BC, 9.2e18 AD] |
| ``M``  | Month       | ± 7.6e17 years       | [7.6e17 BC, 7.6e17 AD] |
| ``W``  | Week	       | ± 1.7e17 years       | [1.7e17 BC, 1.7e17 AD] |
| ``D``  | Day         | ± 2.5e16 years       | [2.5e16 BC, 2.5e16 AD] |
| ``h``  | Hour        | ± 1.0e15 years       | [1.0e15 BC, 1.0e15 AD] |
| ``m``  | Minute      | ± 1.7e13 years       | [1.7e13 BC, 1.7e13 AD] |
| ``s``  | Second      | ± 2.9e12 years       | [ 2.9e9 BC, 2.9e9 AD]  |
| ``ms`` | Millisecond | ± 2.9e9 years        | [ 2.9e6 BC, 2.9e6 AD]  |
| ``us`` | Microsecond | ± 2.9e6 years        | [290301 BC, 294241 AD] |
| ``ns`` | Nanosecond  | ± 292 years          | [ 1678 AD, 2262 AD]    |
| ``ps`` | Picosecond  | ± 106 days           | [ 1969 AD, 1970 AD]    |
| ``fs`` | Femtosecond | ± 2.6 hours          | [ 1969 AD, 1970 AD]    |
| ``as`` | Attosecond  | ± 9.2 seconds        | [ 1969 AD, 1970 AD]    |

### Dates and times in pandas: best of both worlds



In [19]:
import pandas as pd

In [22]:
dates = ["2020, 1, 4", "2020, 1, 7", "2020, 1, 8" , "2020, 1, 22"]

In [24]:
idx = pd.to_datetime(dates)

In [26]:
idx[1].strftime('%A')

'Tuesday'

In [27]:
date

array('2020-11-11', dtype='datetime64[D]')

In [28]:
date + pd.to_timedelta(np.arange(11+1), 'D')

DatetimeIndex(['2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14',
               '2020-11-15', '2020-11-16', '2020-11-17', '2020-11-18',
               '2020-11-19', '2020-11-20', '2020-11-21', '2020-11-22'],
              dtype='datetime64[ns]', freq=None)

In [30]:
date + pd.to_timedelta(np.arange(11+1), 'm')

DatetimeIndex(['2020-11-11 00:00:00', '2020-11-11 00:01:00',
               '2020-11-11 00:02:00', '2020-11-11 00:03:00',
               '2020-11-11 00:04:00', '2020-11-11 00:05:00',
               '2020-11-11 00:06:00', '2020-11-11 00:07:00',
               '2020-11-11 00:08:00', '2020-11-11 00:09:00',
               '2020-11-11 00:10:00', '2020-11-11 00:11:00'],
              dtype='datetime64[ns]', freq=None)

In [31]:
date + pd.to_timedelta(np.arange(11+1), 'W')

DatetimeIndex(['2020-11-11', '2020-11-18', '2020-11-25', '2020-12-02',
               '2020-12-09', '2020-12-16', '2020-12-23', '2020-12-30',
               '2021-01-06', '2021-01-13', '2021-01-20', '2021-01-27'],
              dtype='datetime64[ns]', freq=None)

In [1]:
import pandas as pd
import numpy as np

## Pandas Time Series: Indexing by Time



In [6]:
date_index = pd.DatetimeIndex(['2020-11-01', '2020-11-04','2020-11-08',
                               '2020-11-22', '2021-01-01'])

In [7]:
data = pd.Series([0,1,2,3,4], index = date_index)

In [9]:
data

2020-11-01    0
2020-11-04    1
2020-11-08    2
2020-11-22    3
2021-01-01    4
dtype: int64

In [10]:
data['2020-11-04' : ]

2020-11-04    1
2020-11-08    2
2020-11-22    3
2021-01-01    4
dtype: int64

In [13]:
data['2020-11-10': '2021-02-01']

2020-11-22    3
2021-01-01    4
dtype: int64

In [14]:
data['2020']

2020-11-01    0
2020-11-04    1
2020-11-08    2
2020-11-22    3
dtype: int64

In [15]:
data['2021']

2021-01-01    4
dtype: int64

In [18]:
data['2020-11']

2020-11-01    0
2020-11-04    1
2020-11-08    2
2020-11-22    3
dtype: int64

### Regular sequences: ``pd.date_range()``



In [19]:
dates = pd.to_datetime(['2020-11-01', '2020-11-04','2020-11-08',
                               '2020-11-22', '2021-01-01'])

In [20]:
dates

DatetimeIndex(['2020-11-01', '2020-11-04', '2020-11-08', '2020-11-22',
               '2021-01-01'],
              dtype='datetime64[ns]', freq=None)

In [21]:
dates[0]

Timestamp('2020-11-01 00:00:00')

In [22]:
dates - dates[0]

TimedeltaIndex(['0 days', '3 days', '7 days', '21 days', '61 days'], dtype='timedelta64[ns]', freq=None)

In [23]:
pd.date_range('2020-11-01', periods=8)

DatetimeIndex(['2020-11-01', '2020-11-02', '2020-11-03', '2020-11-04',
               '2020-11-05', '2020-11-06', '2020-11-07', '2020-11-08'],
              dtype='datetime64[ns]', freq='D')

In [24]:
pd.date_range('2020-11-01', periods=8, freq = 'H')

DatetimeIndex(['2020-11-01 00:00:00', '2020-11-01 01:00:00',
               '2020-11-01 02:00:00', '2020-11-01 03:00:00',
               '2020-11-01 04:00:00', '2020-11-01 05:00:00',
               '2020-11-01 06:00:00', '2020-11-01 07:00:00'],
              dtype='datetime64[ns]', freq='H')

In [26]:
pd.date_range('2020-11', periods=8, freq = 'M')

DatetimeIndex(['2020-11-30', '2020-12-31', '2021-01-31', '2021-02-28',
               '2021-03-31', '2021-04-30', '2021-05-31', '2021-06-30'],
              dtype='datetime64[ns]', freq='M')

In [27]:
pd.date_range('2020', periods=8, freq = 'Y')

DatetimeIndex(['2020-12-31', '2021-12-31', '2022-12-31', '2023-12-31',
               '2024-12-31', '2025-12-31', '2026-12-31', '2027-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')

In [28]:
pd.date_range('2020-11', periods=8, freq = 'M')

DatetimeIndex(['2020-11-30', '2020-12-31', '2021-01-31', '2021-02-28',
               '2021-03-31', '2021-04-30', '2021-05-31', '2021-06-30'],
              dtype='datetime64[ns]', freq='M')

In [29]:
pd.date_range('2020-11-12', periods=20, freq = 'B')

DatetimeIndex(['2020-11-12', '2020-11-13', '2020-11-16', '2020-11-17',
               '2020-11-18', '2020-11-19', '2020-11-20', '2020-11-23',
               '2020-11-24', '2020-11-25', '2020-11-26', '2020-11-27',
               '2020-11-30', '2020-12-01', '2020-12-02', '2020-12-03',
               '2020-12-04', '2020-12-07', '2020-12-08', '2020-12-09'],
              dtype='datetime64[ns]', freq='B')

In [30]:
pd.date_range('2020-11-12', periods=8, freq = 'MS')

DatetimeIndex(['2020-12-01', '2021-01-01', '2021-02-01', '2021-03-01',
               '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01'],
              dtype='datetime64[ns]', freq='MS')

## Frequencies and Offsets

| Code   | Description         | Code   | Description          |
|--------|---------------------|--------|----------------------|
| ``D``  | Calendar day        | ``B``  | Business day         |
| ``W``  | Weekly              |        |                      |
| ``M``  | Month end           | ``BM`` | Business month end   |
| ``Q``  | Quarter end         | ``BQ`` | Business quarter end |
| ``A``  | Year end            | ``BA`` | Business year end    |
| ``H``  | Hours               | ``BH`` | Business hours       |
| ``T``  | Minutes             |        |                      |
| ``S``  | Seconds             |        |                      |
| ``L``  | Milliseonds         |        |                      |
| ``U``  | Microseconds        |        |                      |
| ``N``  | nanoseconds         |        |                      |

In [31]:
pd.timedelta_range(0, periods=9, freq="2H30T")

TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                '12:30:00', '15:00:00', '17:30:00', '20:00:00'],
               dtype='timedelta64[ns]', freq='150T')