[时间序列之`实例方法与重采样`](https://mp.weixin.qq.com/s/nxjGNSuqKi6LynA2UFiPjw)

# 位移与延迟

> 有时，需要整体向前或向后移动时间序列里的值，这就是移位与延迟。实现这一操作的方法是`shift()`，该方法适用于所有 Pandas 对象

In [18]:
import numpy as np
import pandas as pd

rng = pd.date_range(start='2020-07-9', periods=3)
rng

DatetimeIndex(['2020-07-09', '2020-07-10', '2020-07-11'], dtype='datetime64[ns]', freq='D')

In [19]:
ts = pd.Series(range(len(rng)), index=rng)
ts = ts[:5]
ts.shift(1)

2020-07-09    NaN
2020-07-10    0.0
2020-07-11    1.0
Freq: D, dtype: float64

## `shift`方法支持`freq`参数，可以把`DateOffset`、`timedelta`对象、偏移量别名作为参数值

In [20]:
ts.shift(5)

2020-07-09   NaN
2020-07-10   NaN
2020-07-11   NaN
Freq: D, dtype: float64

In [21]:
ts.shift(5, freq=pd.offsets.BDay())

2020-07-16    0
2020-07-17    1
2020-07-17    2
dtype: int64

In [22]:
ts.shift(5, freq='BM')

2020-11-30    0
2020-11-30    1
2020-11-30    2
Freq: D, dtype: int64

# 频率转换

> 改变频率的函数主要是`asfreq()`。对于 DatetimeIndex，这就是一个调用`reindex()`，并生成 date_range 的便捷打包器

In [23]:
dr = pd.date_range('1/1/2010', periods=3, freq=3 * pd.offsets.BDay())
dr

DatetimeIndex(['2010-01-01', '2010-01-06', '2010-01-11'], dtype='datetime64[ns]', freq='3B')

In [24]:
ts = pd.Series(np.random.randn(3), index=dr)
ts

2010-01-01    1.367401
2010-01-06   -0.671192
2010-01-11   -0.027697
Freq: 3B, dtype: float64

In [25]:
ts.asfreq(pd.offsets.BDay())

2010-01-01    1.367401
2010-01-04         NaN
2010-01-05         NaN
2010-01-06   -0.671192
2010-01-07         NaN
2010-01-08         NaN
2010-01-11   -0.027697
Freq: B, dtype: float64

In [26]:
# 对空值执行向前填充

ts.asfreq(pd.offsets.BDay(), method='ffill')

2010-01-01    1.367401
2010-01-04    1.367401
2010-01-05    1.367401
2010-01-06   -0.671192
2010-01-07   -0.671192
2010-01-08   -0.671192
2010-01-11   -0.027697
Freq: B, dtype: float64

In [27]:
# 对空值执行向后填充

ts.asfreq(pd.offsets.BDay(), method='bfill')

2010-01-01    1.367401
2010-01-04   -0.671192
2010-01-05   -0.671192
2010-01-06   -0.671192
2010-01-07   -0.027697
2010-01-08   -0.027697
2010-01-11   -0.027697
Freq: B, dtype: float64

# 重采样

> resample() 是基于时间的分组操作，每个组都遵循归纳方法

In [29]:
rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts

2012-01-01 00:00:00    457
2012-01-01 00:00:01    304
2012-01-01 00:00:02    387
2012-01-01 00:00:03    399
2012-01-01 00:00:04    286
                      ... 
2012-01-01 00:01:35    332
2012-01-01 00:01:36    231
2012-01-01 00:01:37     65
2012-01-01 00:01:38    310
2012-01-01 00:01:39    472
Freq: S, Length: 100, dtype: int64

In [30]:
ts.resample('5Min').sum()

2012-01-01    26754
Freq: 5T, dtype: int64

## 可用于`resample`返回对象的常用函数有：sum、mean、std、sem、max、min、mid、median、first、last、ohlc

In [31]:
ts.resample('5Min').mean()

2012-01-01    267.54
Freq: 5T, dtype: float64

In [32]:
ts.resample('5Min').ohlc()

Unnamed: 0,open,high,low,close
2012-01-01,457,472,6,472


In [33]:
ts.resample('5Min').max()

2012-01-01    472
Freq: 5T, dtype: int64

## 对于下采样，closed 可以设置为 `left` 或 `right`
> + left: 左闭右开区间
> + right: 左开右闭区间

In [34]:
ts.resample(rule='5Min', closed='left').mean()

2012-01-01    267.54
Freq: 5T, dtype: float64

In [35]:
ts.resample(rule='5Min', closed='right').mean()

2011-12-31 23:55:00    457.000000
2012-01-01 00:00:00    265.626263
Freq: 5T, dtype: float64

> 上例中设置`closed='right'`后，第一个时间点数据`2012-01-01 00:00:00`就得前移一个分组，也就是要落入`2011-12-31 23:55:00`这个分组中

## label 和 loffset 作用于生成标签
> + label: 指定生成的结果是否为间隔的起始时间
> + loffset: 调整输出标签的时间 

> 除了 M、A、Q、BM、BA、BQ、W 的默认值是 right 外，其它频率偏移量的 label 与 closed 默认值都是 left

In [44]:
ts.resample('5Min', closed='right').mean()

2011-12-31 23:55:00    457.000000
2012-01-01 00:00:00    265.626263
Freq: 5T, dtype: float64

In [45]:
ts.resample('5Min', closed='right', label='right').mean()

2012-01-01 00:00:00    457.000000
2012-01-01 00:05:00    265.626263
Freq: 5T, dtype: float64

In [46]:
ts.resample('5Min', closed='right', label='right', loffset='5s').mean()

2012-01-01 00:00:05    457.000000
2012-01-01 00:05:05    265.626263
dtype: float64

## 上采样

In [47]:
ts[:2]

2012-01-01 00:00:00    457
2012-01-01 00:00:01    304
Freq: S, dtype: int64

In [50]:
ts[:2].resample('250L')

<pandas.core.resample.DatetimeIndexResampler object at 0x7f8cc87ddef0>

In [51]:
# 向上采样，从秒到每250毫秒

ts[:2].resample('250L').asfreq()

2012-01-01 00:00:00.000    457.0
2012-01-01 00:00:00.250      NaN
2012-01-01 00:00:00.500      NaN
2012-01-01 00:00:00.750      NaN
2012-01-01 00:00:01.000    304.0
Freq: 250L, dtype: float64

In [52]:
ts[:2].resample('250L').ffill()

2012-01-01 00:00:00.000    457
2012-01-01 00:00:00.250    457
2012-01-01 00:00:00.500    457
2012-01-01 00:00:00.750    457
2012-01-01 00:00:01.000    304
Freq: 250L, dtype: int64

In [53]:
ts[:2].resample('250L').ffill(limit=2)

2012-01-01 00:00:00.000    457.0
2012-01-01 00:00:00.250    457.0
2012-01-01 00:00:00.500    457.0
2012-01-01 00:00:00.750      NaN
2012-01-01 00:00:01.000    304.0
Freq: 250L, dtype: float64

In [54]:
ts[:2].resample('250L').bfill()

2012-01-01 00:00:00.000    457
2012-01-01 00:00:00.250    304
2012-01-01 00:00:00.500    304
2012-01-01 00:00:00.750    304
2012-01-01 00:00:01.000    304
Freq: 250L, dtype: int64