# 时间序列分析

**参考**：   
1. Wes McKinney, Python for Data Analysis, 2017
2. https://www.cnblogs.com/foley/p/5582358.html   
3. 王燕，应用时间序列分析

### Pandas处理时间序列

In [1]:
from datetime import datetime
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

#### 日期和时间数据类型转换

**DateTime格式定义**：  
%Y Four-digit year  
%y Two-digit year  
%m Two-digit month [01, 12]  
%d Two-digit day [01, 31]   
%H Hour (24-hour clock) [00, 23]   
%I Hour (12-hour clock) [01, 12]     
%M Two-digit minute [00, 59]   
%S Second [00, 61] (seconds 60, 61 account for leap seconds)   
%w Weekday as integer [0 (Sunday), 6]   
%U Week number of the year [00, 53]; Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0”    
%W Week number of the year [00, 53]; Monday is considered the first day of the week, and days before the first Monday of the year are “week 0”    
%z UTC time zone offset as +HHMM or -HHMM; empty if time zone naive    
%F Shortcut for %Y-%m-%d (e.g., 2012-4-18)    
%D Shortcut for %m/%d/%y (e.g., 04/18/12)   

**特定日期格式**   
%a Abbreviated weekday name   
%A Full weekday name   
%b Abbreviated month name    
%B Full month name    
%c Full date and time (e.g., ‘Tue 01 May 2012 04:20:57 PM’)    
%p Locale equivalent of AM or PM   
%x Locale-appropriate formatted date (e.g., in the United States, May 1, 2012 yields ’05/01/2012’) 
%X Locale-appropriate time (e.g., ’04:24:12 PM’)    

In [2]:
# 日期和字符串间的转化
stamp = datetime(2018, 1, 1, 12, 1, 10)
print(type(stamp), stamp)
print(str(stamp))
print(stamp.strftime('%Y-%m-%d'), '\n')

datestr = '2019-01-03'
date = datetime.strptime(datestr, '%Y-%m-%d')
print(type(date), date, '\n')

timestr = '12:10:10'
time = datetime.strptime(timestr, '%H:%M:%S')
print(type(time), time, '\n')

datestrs = ['7/16/2018', '12/30/2018']
pd.to_datetime(datestrs)

<class 'datetime.datetime'> 2018-01-01 12:01:10
2018-01-01 12:01:10
2018-01-01 

<class 'datetime.datetime'> 2019-01-03 00:00:00 

<class 'datetime.datetime'> 1900-01-01 12:10:10 



DatetimeIndex(['2018-07-16', '2018-12-30'], dtype='datetime64[ns]', freq=None)

#### 时间序列基础

In [3]:
dates = [datetime(2019, 1, 2), datetime(2018, 1, 5), datetime(2017, 1, 7), 
         datetime(2019, 1, 8), datetime(2018, 1, 10), datetime(2017, 1, 12)]

ts = pd.Series(np.random.randn(6), index=dates)
print(ts)
print(ts.index)
# 选择
print(ts['20180105'], ts['2017-01-7'])
print(ts[ts.index[2]])
# 切片
print(ts[::2])

2019-01-02   -1.703825
2018-01-05   -1.622203
2017-01-07   -0.725515
2019-01-08    0.124481
2018-01-10    1.469449
2017-01-12   -0.044182
dtype: float64
DatetimeIndex(['2019-01-02', '2018-01-05', '2017-01-07', '2019-01-08',
               '2018-01-10', '2017-01-12'],
              dtype='datetime64[ns]', freq=None)
2018-01-05   -1.622203
dtype: float64 2017-01-07   -0.725515
dtype: float64
-0.7255148264011163
2019-01-02   -1.703825
2017-01-07   -0.725515
2018-01-10    1.469449
dtype: float64


#### 日期范围、频率和移动

In [4]:
index1 = pd.date_range('2018-04-01', '2019-01-01')
index2 = pd.date_range(start='2018-04-01', periods=20)
index3 = pd.date_range(start='2019-01-01', periods=20)
index4 = pd.date_range('2012-05-02 12:56:31', periods=5)
print(index1, '\n', index2, '\n', index3, '\n', index4)

DatetimeIndex(['2018-04-01', '2018-04-02', '2018-04-03', '2018-04-04',
               '2018-04-05', '2018-04-06', '2018-04-07', '2018-04-08',
               '2018-04-09', '2018-04-10',
               ...
               '2018-12-23', '2018-12-24', '2018-12-25', '2018-12-26',
               '2018-12-27', '2018-12-28', '2018-12-29', '2018-12-30',
               '2018-12-31', '2019-01-01'],
              dtype='datetime64[ns]', length=276, freq='D') 
 DatetimeIndex(['2018-04-01', '2018-04-02', '2018-04-03', '2018-04-04',
               '2018-04-05', '2018-04-06', '2018-04-07', '2018-04-08',
               '2018-04-09', '2018-04-10', '2018-04-11', '2018-04-12',
               '2018-04-13', '2018-04-14', '2018-04-15', '2018-04-16',
               '2018-04-17', '2018-04-18', '2018-04-19', '2018-04-20'],
              dtype='datetime64[ns]', freq='D') 
 DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',


**基本时间序列频率**   
D Day Calendar daily   
B BusinessDay Business daily   
H Hour Hourly   
T or min Minute Minutely   
S Second Secondly   
L or ms Milli Millisecond (1/1,000 of 1 second)   
U Micro Microsecond (1/1,000,000 of 1 second)   
M MonthEnd Last calendar day of month   
BM BusinessMonthEnd Last business day (weekday) of month   
MS MonthBegin First calendar day of month   
BMS BusinessMonthBegin First weekday of month   
W-MON, W-TUE, ... Week Weekly on given day of week (MON, TUE, WED, THU, FRI, SAT, or SUN)   
WOM-1MON, WOM-2MON, ... WeekOfMonth Generate weekly dates in the first, second, third, or fourth week of the month (e.g., WOM-3FRI for the third Friday of each month)   
Q-JAN, Q-FEB, ... QuarterEnd Quarterly dates anchored on last calendar day of each month, for year ending in indicated month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC)   
BQ-JAN, BQ-FEB, ... BusinessQuarterEnd Quarterly dates anchored on last weekday day of each month, for year ending in indicated month   
QS-JAN, QS-FEB, ... QuarterBegin Quarterly dates anchored on first calendar day of each month, for year ending in indicated month   
BQS-JAN, BQS-FEB, ... BusinessQuarterBegin Quarterly dates anchored on first weekday day of each month, for year ending in indicated month   
A-JAN, A-FEB, ... YearEnd Annual dates anchored on last calendar day of given month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC)   
BA-JAN, BA-FEB, ... BusinessYearEnd Annual dates anchored on last weekday of given month   
AS-JAN, AS-FEB, ... YearBegin Annual dates anchored on first day of given month   
BAS-JAN, BAS-FEB, ... BusinessYearBegin Annual dates anchored on first weekday of given month   

In [5]:
print(pd.date_range('2019-01-01', periods=10, freq='1h30min'))
# WOM-3FRI: 每个月的第三个星期五
print(pd.date_range('2019-01-01', '2019-09-01', freq='WOM-3FRI'))

DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 01:30:00',
               '2019-01-01 03:00:00', '2019-01-01 04:30:00',
               '2019-01-01 06:00:00', '2019-01-01 07:30:00',
               '2019-01-01 09:00:00', '2019-01-01 10:30:00',
               '2019-01-01 12:00:00', '2019-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')
DatetimeIndex(['2019-01-18', '2019-02-15', '2019-03-15', '2019-04-19',
               '2019-05-17', '2019-06-21', '2019-07-19', '2019-08-16'],
              dtype='datetime64[ns]', freq='WOM-3FRI')


### 时间序列预处理

平稳性，正态性，独立性，周期性，趋势项

### 平稳时间序列分析

### 非平稳时间序列分析

### 多元时间序列分析