### 시계열 데이터
* 시계열 데이터
    - 시간의 흐름에 따라 데이터의 변화를 분석 및 예측하는데 사용되는 데이터
    - 타임스탬프(timestamp) : 특정 시점(시간)을 의미하는 자료형
* 기능
    - to_datetime : 데이터를 시간 자료형으로 변환시켜주는 함수
    - to_period : 날짜 데이터의 년, 월, 일 등을 가져올 수 있다.
        * 옵션
            - freq를 이용하여 각각의 날짜 정보를 얻어올 수 있다
            - freq = A(년), M(월), D(일)
            - D지정시 2020-01-01까지 나오며, M지정시 2020-01, A지정시 2020까지 나온다.
    - Datetimeindex자료형
        * dt객체를 제공하며, dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second...등을 제공한다
>
* 자세한 정보는 다음 사이트를 참조하기 바란다.
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html

In [1]:
import pandas as pd
import numpy as np

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [2]:
dates = ['2021-01-01','2021-03-01','2021-05-01']
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2021-01-01
1,2021-03-01
2,2021-05-01


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    3 non-null      object
dtypes: object(1)
memory usage: 156.0+ bytes


In [4]:
df['new_date'] = pd.to_datetime(df['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   date      3 non-null      object        
 1   new_date  3 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 180.0+ bytes


In [5]:
df['year'] = df['new_date'].dt.year
df['month'] = df['new_date'].dt.month
df['day'] = df['new_date'].dt.day
df

Unnamed: 0,date,new_date,year,month,day
0,2021-01-01,2021-01-01,2021,1,1
1,2021-03-01,2021-03-01,2021,3,1
2,2021-05-01,2021-05-01,2021,5,1


In [6]:
dates = [1,2,3]
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,1
1,2
2,3


In [7]:
for i in df['date']:
    print(i * 10)

10
20
30


In [8]:
df['date'] = df['date'].apply(lambda x : x * 10)
df

Unnamed: 0,date
0,10
1,20
2,30


In [9]:
dates = ['2021-01-01','2021-03-01','2021-05-01']
df = pd.DataFrame(dates, columns=['date'])
df

Unnamed: 0,date
0,2021-01-01
1,2021-03-01
2,2021-05-01


In [10]:
df.dtypes

date    object
dtype: object

In [11]:
date = df['date'].apply(lambda x: pd.to_datetime(x))
date

0   2021-01-01
1   2021-03-01
2   2021-05-01
Name: date, dtype: datetime64[ns]

In [12]:
df['year'] = date.apply(lambda x: x.year)
df['month'] = date.apply(lambda x: x.month)
df['day'] = date.apply(lambda x: x.day)
df['요일'] = date.apply(lambda x: x.dayofweek)
df

Unnamed: 0,date,year,month,day,요일
0,2021-01-01,2021,1,1,4
1,2021-03-01,2021,3,1,0
2,2021-05-01,2021,5,1,5


In [13]:
# 조건문이 참이면 1111 리턴
df['month'] = df['month'].apply(lambda x: 1111 if x > 2 else x)
df

Unnamed: 0,date,year,month,day,요일
0,2021-01-01,2021,1,1,4
1,2021-03-01,2021,1111,1,0
2,2021-05-01,2021,1111,1,5


In [23]:
df[(df['year'] == 2021) & (df['month'] == 1111)]['요일'] = \
    df[(df['year'] == 2021) & (df['month'] == 1111)]['요일'].apply(lambda x: 1111 if x > 2 else x)
df # 적용 안됨 따라서 loc 적용

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[(df['year'] == 2021) & (df['month'] == 1111)]['요일'] = \


Unnamed: 0,date,year,month,day,요일
0,2021-01-01,2021,1,1,4
1,2021-03-01,2021,1111,1,0
2,2021-05-01,2021,1111,1,5


In [24]:
df.loc[(df['year'] == 2021) & (df['month'] == 1111), '요일'] = \
    df[(df['year'] == 2021) & (df['month'] == 1111)]['요일'].apply(lambda x: 1111 if x > 2 else x)
df # 잘 적용됨

Unnamed: 0,date,year,month,day,요일
0,2021-01-01,2021,1,1,4
1,2021-03-01,2021,1111,1,0
2,2021-05-01,2021,1111,1,1111


In [27]:
df_test = df[['date', 'year']]
df_test

Unnamed: 0,date,year
0,2021-01-01,2021
1,2021-03-01,2021
2,2021-05-01,2021


In [28]:
df_test = df_test.to_dict("list")
df_test

{'date': ['2021-01-01', '2021-03-01', '2021-05-01'],
 'year': [2021, 2021, 2021]}

In [29]:
df_test['date']

['2021-01-01', '2021-03-01', '2021-05-01']

In [31]:
df_date = df_test['date']
df_year = df_test['year']

for i in range(len(df_date)):
    print(df_date[i], " : " , df_year[i])    

2021-01-01  :  2021
2021-03-01  :  2021
2021-05-01  :  2021
