#### pandas.resample

重新采样，是对原样本重新处理的一个方法，是一个对常规时间序列数据重新采样和
频率转换的便捷方法。
降采样：高频数据到低频数据；
升采样：低频数据到高频数据；

主要函数：resample()
主要参数：
- freq： 重采样频率， 'M', '5min', second(15)

- how: 用于产生聚合值的函数名或数组函数，例如：‘mean’,np.max, median, max

- axis = 0 : 默认纵轴，横轴axis = 1

- fill_methond = None: 升采样时如何插值， 'ffill', 'bfill'

- closed = 'right': or 'left'

- label = 'right': or 'left'

- loffset = None:

- limit = None:

- kind = None:

- convention = None


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

In [7]:
index = pd.date_range('1/1/2020', periods = 9, freq = 'T')
print(index)
series = pd.Series(range(9), index = index)
series

DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 00:01:00',
               '2020-01-01 00:02:00', '2020-01-01 00:03:00',
               '2020-01-01 00:04:00', '2020-01-01 00:05:00',
               '2020-01-01 00:06:00', '2020-01-01 00:07:00',
               '2020-01-01 00:08:00'],
              dtype='datetime64[ns]', freq='T')


2020-01-01 00:00:00    0
2020-01-01 00:01:00    1
2020-01-01 00:02:00    2
2020-01-01 00:03:00    3
2020-01-01 00:04:00    4
2020-01-01 00:05:00    5
2020-01-01 00:06:00    6
2020-01-01 00:07:00    7
2020-01-01 00:08:00    8
Freq: T, dtype: int64

降低频率为三分钟

In [28]:
series.resample('3T').sum()    # closed = 'left' [0,3), [3, 6), [6, 8))

2020-01-01 00:00:00     3
2020-01-01 00:03:00    12
2020-01-01 00:06:00    21
Freq: 3T, dtype: int64

In [31]:
series.resample('3T', closed = 'right').sum() # (57, 58, 59, 0], (0, 1,2,3],(3,4,5,6], (6,7,8] ???

2019-12-31 23:57:00     0
2020-01-01 00:00:00     6
2020-01-01 00:03:00    15
2020-01-01 00:06:00    15
Freq: 3T, dtype: int64

降低采样频率为三分钟，但是关闭right区间，即左开右闭区间

In [32]:
series.resample('3T', label = 'right', closed = 'right').sum()

2020-01-01 00:00:00     0
2020-01-01 00:03:00     6
2020-01-01 00:06:00    15
2020-01-01 00:09:00    15
Freq: 3T, dtype: int64

增加采样频率到30秒

In [33]:
series.resample('30S').asfreq()[:] # asfreq 是日期序列补充函数

2020-01-01 00:00:00    0.0
2020-01-01 00:00:30    NaN
2020-01-01 00:01:00    1.0
2020-01-01 00:01:30    NaN
2020-01-01 00:02:00    2.0
2020-01-01 00:02:30    NaN
2020-01-01 00:03:00    3.0
2020-01-01 00:03:30    NaN
2020-01-01 00:04:00    4.0
2020-01-01 00:04:30    NaN
2020-01-01 00:05:00    5.0
2020-01-01 00:05:30    NaN
2020-01-01 00:06:00    6.0
2020-01-01 00:06:30    NaN
2020-01-01 00:07:00    7.0
2020-01-01 00:07:30    NaN
2020-01-01 00:08:00    8.0
Freq: 30S, dtype: float64

In [34]:
series.resample('30S').bfill()[:]

2020-01-01 00:00:00    0
2020-01-01 00:00:30    1
2020-01-01 00:01:00    1
2020-01-01 00:01:30    2
2020-01-01 00:02:00    2
2020-01-01 00:02:30    3
2020-01-01 00:03:00    3
2020-01-01 00:03:30    4
2020-01-01 00:04:00    4
2020-01-01 00:04:30    5
2020-01-01 00:05:00    5
2020-01-01 00:05:30    6
2020-01-01 00:06:00    6
2020-01-01 00:06:30    7
2020-01-01 00:07:00    7
2020-01-01 00:07:30    8
2020-01-01 00:08:00    8
Freq: 30S, dtype: int64

In [35]:
series.resample('30S').pad()[:]

2020-01-01 00:00:00    0
2020-01-01 00:00:30    0
2020-01-01 00:01:00    1
2020-01-01 00:01:30    1
2020-01-01 00:02:00    2
2020-01-01 00:02:30    2
2020-01-01 00:03:00    3
2020-01-01 00:03:30    3
2020-01-01 00:04:00    4
2020-01-01 00:04:30    4
2020-01-01 00:05:00    5
2020-01-01 00:05:30    5
2020-01-01 00:06:00    6
2020-01-01 00:06:30    6
2020-01-01 00:07:00    7
2020-01-01 00:07:30    7
2020-01-01 00:08:00    8
Freq: 30S, dtype: int64

In [36]:
def custom_resampler(array_like):
    return np.sum(array_like) + 5
series.resample('3T').apply(custom_resampler)

2020-01-01 00:00:00     8
2020-01-01 00:03:00    17
2020-01-01 00:06:00    26
Freq: 3T, dtype: int64

In [37]:
times = pd.date_range('2/1/2020', periods = 30)
ts = pd.Series(np.arange(1, 31), index = times)
ts

2020-02-01     1
2020-02-02     2
2020-02-03     3
2020-02-04     4
2020-02-05     5
2020-02-06     6
2020-02-07     7
2020-02-08     8
2020-02-09     9
2020-02-10    10
2020-02-11    11
2020-02-12    12
2020-02-13    13
2020-02-14    14
2020-02-15    15
2020-02-16    16
2020-02-17    17
2020-02-18    18
2020-02-19    19
2020-02-20    20
2020-02-21    21
2020-02-22    22
2020-02-23    23
2020-02-24    24
2020-02-25    25
2020-02-26    26
2020-02-27    27
2020-02-28    28
2020-02-29    29
2020-03-01    30
Freq: D, dtype: int64

In [38]:
# 1-30号的series，然后聚合成左闭右开的5个区间[1,8), [8, 15), [15, 22)
#[22,29), [29, 30)), 每个区间的值就是单个区间值之和。
ts_7d = ts.resample('7D').sum()
ts_7d

2020-02-01     28
2020-02-08     77
2020-02-15    126
2020-02-22    175
2020-02-29     59
Freq: 7D, dtype: int64

In [40]:
# 左开右闭，需要往前取，(25, 1], (1, 8], (8, 15], (15, 22], (22, 29], (29, 30]
# label = right,就是指label等于右区间的值， label=left就是指label等于左区间的值
ts_7d = ts.resample('7D', closed = 'right', label = 'left').sum()
ts_7d

2020-01-25      1
2020-02-01     35
2020-02-08     84
2020-02-15    133
2020-02-22    182
2020-02-29     30
Freq: 7D, dtype: int64