# Working with Time Series

## Dates and Times in Python


### Native Python dates and times: ``datetime`` and ``dateutil``

In [1]:
from datetime import datetime

In [2]:
datetime(2021, 2, 15)

datetime.datetime(2021, 2, 15, 0, 0)

In [3]:
from dateutil import parser

In [4]:
parser.parse("2021-02-10")

datetime.datetime(2021, 2, 10, 0, 0)

In [5]:
my_date = parser.parse("2021-02-10")

In [6]:
my_date.strftime('%A')

'Wednesday'

In [7]:
my_date.year

2021

In [8]:
#참고 메뉴얼:

[strftime section](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) 

[datetime documentation](https://docs.python.org/3/library/datetime.html)

[dateutil's online documentation](http://labix.org/python-dateutil)


### Typed arrays of times: NumPy's ``datetime64``

기존의 파이썬 datetime 을 보강하기 위해,  date 의 array 도 처리할 수 있게 numpy 에서 64-bit 로 처리하도록 라이브러리를 강화했음.

In [9]:
import numpy as np

In [10]:
# np array 로 만들면 문자열로 만들어 짐
my_date = np.array('2021-01-13')

In [11]:
my_date

array('2021-01-13', dtype='<U10')

np.array( dtyp=np.datetime64) 날짜형식 만들기

In [12]:
my_date = np.array('2021-01-13', dtype=np.datetime64)

In [13]:
my_date

array('2021-01-13', dtype='datetime64[D]')

In [14]:
my_date = np.array('2021-01-13 14:23:11.500', dtype=np.datetime64)

In [15]:
my_date

array('2021-01-13T14:23:11.500', dtype='datetime64[ms]')

In [16]:
# 두 번째 파라미터 'us'는 밀리 세컨드
np.datetime64('2021-01-13 14:23:11.500', 'us')

numpy.datetime64('2021-01-13T14:23:11.500000')

In [17]:
np.datetime64('2021-01-13 14:23:11.500', 'D')

numpy.datetime64('2021-01-13')

|Code    | Meaning     | Time span (relative) | Time span (absolute)   |
|--------|-------------|----------------------|------------------------|
| ``Y``  | Year	       | ± 9.2e18 years       | [9.2e18 BC, 9.2e18 AD] |
| ``M``  | Month       | ± 7.6e17 years       | [7.6e17 BC, 7.6e17 AD] |
| ``W``  | Week	       | ± 1.7e17 years       | [1.7e17 BC, 1.7e17 AD] |
| ``D``  | Day         | ± 2.5e16 years       | [2.5e16 BC, 2.5e16 AD] |
| ``h``  | Hour        | ± 1.0e15 years       | [1.0e15 BC, 1.0e15 AD] |
| ``m``  | Minute      | ± 1.7e13 years       | [1.7e13 BC, 1.7e13 AD] |
| ``s``  | Second      | ± 2.9e12 years       | [ 2.9e9 BC, 2.9e9 AD]  |
| ``ms`` | Millisecond | ± 2.9e9 years        | [ 2.9e6 BC, 2.9e6 AD]  |
| ``us`` | Microsecond | ± 2.9e6 years        | [290301 BC, 294241 AD] |
| ``ns`` | Nanosecond  | ± 292 years          | [ 1678 AD, 2262 AD]    |
| ``ps`` | Picosecond  | ± 106 days           | [ 1969 AD, 1970 AD]    |
| ``fs`` | Femtosecond | ± 2.6 hours          | [ 1969 AD, 1970 AD]    |
| ``as`` | Attosecond  | ± 9.2 seconds        | [ 1969 AD, 1970 AD]    |

### Dates and times in pandas: best of both worlds



In [18]:
import pandas as pd

In [19]:
dates = [ '2021, 1, 4', '2021, 1, 7', '2021, 1, 8', '2021, 1, 9']

In [20]:
#문자열을 datetime 형식으로 바꾸기

In [21]:
pd.to_datetime(dates)


DatetimeIndex(['2021-01-04', '2021-01-07', '2021-01-08', '2021-01-09'], dtype='datetime64[ns]', freq=None)

In [22]:
idx = pd.to_datetime(dates)

strftime()활용, 위의 링크 이용

In [23]:
idx[0].strftime('%A')

'Monday'

In [24]:
my_date

array('2021-01-13T14:23:11.500', dtype='datetime64[ms]')

In [25]:
# 기준 날짜 하나로 여러개 만들기 

pd.to_timedelta(np.arange(10), 'D')


In [26]:
my_date + pd.to_timedelta(np.arange(12+1), 'D')

DatetimeIndex(['2021-01-13 14:23:11.500000', '2021-01-14 14:23:11.500000',
               '2021-01-15 14:23:11.500000', '2021-01-16 14:23:11.500000',
               '2021-01-17 14:23:11.500000', '2021-01-18 14:23:11.500000',
               '2021-01-19 14:23:11.500000', '2021-01-20 14:23:11.500000',
               '2021-01-21 14:23:11.500000', '2021-01-22 14:23:11.500000',
               '2021-01-23 14:23:11.500000', '2021-01-24 14:23:11.500000',
               '2021-01-25 14:23:11.500000'],
              dtype='datetime64[ns]', freq=None)

In [27]:
my_date + pd.to_timedelta(np.arange(12+1), 'm')

DatetimeIndex(['2021-01-13 14:23:11.500000', '2021-01-13 14:24:11.500000',
               '2021-01-13 14:25:11.500000', '2021-01-13 14:26:11.500000',
               '2021-01-13 14:27:11.500000', '2021-01-13 14:28:11.500000',
               '2021-01-13 14:29:11.500000', '2021-01-13 14:30:11.500000',
               '2021-01-13 14:31:11.500000', '2021-01-13 14:32:11.500000',
               '2021-01-13 14:33:11.500000', '2021-01-13 14:34:11.500000',
               '2021-01-13 14:35:11.500000'],
              dtype='datetime64[ns]', freq=None)

## Pandas Time Series: Indexing by Time



pd.DatetimeIndex( [ ] )날짜형식으로 인덱스로 만들어 주기

In [28]:
pd.DatetimeIndex( ['2021-02-10', '2021-02-4', '2021-02-8', '2021-02-22', '2021-02-28'] )

DatetimeIndex(['2021-02-10', '2021-02-04', '2021-02-08', '2021-02-22',
               '2021-02-28'],
              dtype='datetime64[ns]', freq=None)

In [29]:
date_index = pd.DatetimeIndex( ['2021-02-10', '2021-02-4', '2021-02-8', '2021-02-22', '2021-02-28'] )

In [30]:
pd.Series(data=[0, 1, 2, 3, 4], index=date_index)

2021-02-10    0
2021-02-04    1
2021-02-08    2
2021-02-22    3
2021-02-28    4
dtype: int64

In [31]:
my_data = pd.Series(data=[0, 1, 2, 3, 4], index=date_index)

In [32]:
# 이렇게 하면 장점은 슬라이싱이 된다

my_data [ '2021-02-10' : ]

2021-02-10    0
2021-02-04    1
2021-02-08    2
2021-02-22    3
2021-02-28    4
dtype: int64

In [33]:
my_data [ '2021-02-10' : '2021-02-22']

2021-02-10    0
2021-02-22    3
dtype: int64

In [34]:
# 문자열이면 안되지만 날짜데이터로 되어 있기 때문에 유연하게 사용할 수 있다
# 년 도만 가져오기
my_data['2021']

2021-02-10    0
2021-02-04    1
2021-02-08    2
2021-02-22    3
2021-02-28    4
dtype: int64

In [35]:
# 달 만 가져오기
my_data['2021-02']

2021-02-10    0
2021-02-04    1
2021-02-08    2
2021-02-22    3
2021-02-28    4
dtype: int64

### Regular sequences: ``pd.date_range()``



In [36]:
idx

DatetimeIndex(['2021-01-04', '2021-01-07', '2021-01-08', '2021-01-09'], dtype='datetime64[ns]', freq=None)

날짜의 차이 알기, date형식에서 - date형식

In [37]:
idx - idx[0]

TimedeltaIndex(['0 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

pd.date_range('날짜', periods= 스텝값)

In [38]:
pd.date_range('2021-02-01', periods= 8)

DatetimeIndex(['2021-02-01', '2021-02-02', '2021-02-03', '2021-02-04',
               '2021-02-05', '2021-02-06', '2021-02-07', '2021-02-08'],
              dtype='datetime64[ns]', freq='D')

pd.date_range('2021-02-01', periods= 8, freq='H') freq파라미터

In [39]:
pd.date_range('2021-02-01', periods= 8, freq='H')

DatetimeIndex(['2021-02-01 00:00:00', '2021-02-01 01:00:00',
               '2021-02-01 02:00:00', '2021-02-01 03:00:00',
               '2021-02-01 04:00:00', '2021-02-01 05:00:00',
               '2021-02-01 06:00:00', '2021-02-01 07:00:00'],
              dtype='datetime64[ns]', freq='H')

In [40]:
pd.date_range('2021-02-01', periods= 8, freq='M')

DatetimeIndex(['2021-02-28', '2021-03-31', '2021-04-30', '2021-05-31',
               '2021-06-30', '2021-07-31', '2021-08-31', '2021-09-30'],
              dtype='datetime64[ns]', freq='M')

In [41]:
# freq='Y'
# 마지막 날을 보여줌 12-31
pd.date_range('2021-02-01', periods= 8, freq='Y')

DatetimeIndex(['2021-12-31', '2022-12-31', '2023-12-31', '2024-12-31',
               '2025-12-31', '2026-12-31', '2027-12-31', '2028-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')

In [42]:
# freq='B'
# 공휴일 빼고 나옴
pd.date_range('2021-02-01', periods= 8, freq='B')

DatetimeIndex(['2021-02-01', '2021-02-02', '2021-02-03', '2021-02-04',
               '2021-02-05', '2021-02-08', '2021-02-09', '2021-02-10'],
              dtype='datetime64[ns]', freq='B')

## Frequencies and Offsets

| Code   | Description         | Code   | Description          |
|--------|---------------------|--------|----------------------|
| ``D``  | Calendar day        | ``B``  | Business day         |
| ``W``  | Weekly              |        |                      |
| ``M``  | Month end           | ``BM`` | Business month end   |
| ``Q``  | Quarter end         | ``BQ`` | Business quarter end |
| ``A``  | Year end            | ``BA`` | Business year end    |
| ``H``  | Hours               | ``BH`` | Business hours       |
| ``T``  | Minutes             |        |                      |
| ``S``  | Seconds             |        |                      |
| ``L``  | Milliseonds         |        |                      |
| ``U``  | Microseconds        |        |                      |
| ``N``  | nanoseconds         |        |                      |

In [43]:
pd.timedelta_range(0, periods=9, freq='2H30T')

TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
                '0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00',
                '0 days 15:00:00', '0 days 17:30:00', '0 days 20:00:00'],
               dtype='timedelta64[ns]', freq='150T')