pandas 中的时间序列一般被认为是不规则的，也就是说，它们没有固定的频率。对于大部分应用程序而言，这是无所谓的。但是，它常常需要以某种相对固定的频率进行分析，比如每日、每月、每分等（这样自然会在时间序列中引入缺失值）。pandas 有一整套标准时间序列频率以及用于重采样、频率推断、生成固定频率日期范围的工具。 


In [24]:
from pandas import Series
import pandas as pd
import numpy as np
from datetime import datetime
dates = [datetime(2015,1,2),datetime(2015,1,5),
     datetime(2015,1,7),datetime(2015,1,8),
    datetime(2015,1,10),datetime(2015,1,12)]
ts = Series(np.random.randn(6), index=dates)   

In [7]:
ts

2015-01-02   -0.624200
2015-01-05   -0.253471
2015-01-07    0.114922
2015-01-08   -0.306915
2015-01-10   -0.520817
2015-01-12   -0.834741
dtype: float64

In [21]:
#时间序列转换为一个具有固定频率（每日）的时间序列
ts.resample('D', fill_method='ffill')  #向后填充

the new syntax is .resample(...).ffill()
  


2015-01-02   -0.624200
2015-01-03   -0.624200
2015-01-04   -0.624200
2015-01-05   -0.253471
2015-01-06   -0.253471
2015-01-07    0.114922
2015-01-08   -0.306915
2015-01-09   -0.306915
2015-01-10   -0.520817
2015-01-11   -0.520817
2015-01-12   -0.834741
Freq: D, dtype: float64

In [22]:
#向前填充
ts.resample('D', fill_method='bfill')  

the new syntax is .resample(...).bfill()
  


2015-01-02   -0.624200
2015-01-03   -0.253471
2015-01-04   -0.253471
2015-01-05   -0.253471
2015-01-06    0.114922
2015-01-07    0.114922
2015-01-08   -0.306915
2015-01-09   -0.520817
2015-01-10   -0.520817
2015-01-11   -0.834741
2015-01-12   -0.834741
Freq: D, dtype: float64

## 生成日期范围

In [26]:
index = pd.date_range('4/1/2015','6/1/2015')
index

DatetimeIndex(['2015-04-01', '2015-04-02', '2015-04-03', '2015-04-04',
               '2015-04-05', '2015-04-06', '2015-04-07', '2015-04-08',
               '2015-04-09', '2015-04-10', '2015-04-11', '2015-04-12',
               '2015-04-13', '2015-04-14', '2015-04-15', '2015-04-16',
               '2015-04-17', '2015-04-18', '2015-04-19', '2015-04-20',
               '2015-04-21', '2015-04-22', '2015-04-23', '2015-04-24',
               '2015-04-25', '2015-04-26', '2015-04-27', '2015-04-28',
               '2015-04-29', '2015-04-30', '2015-05-01', '2015-05-02',
               '2015-05-03', '2015-05-04', '2015-05-05', '2015-05-06',
               '2015-05-07', '2015-05-08', '2015-05-09', '2015-05-10',
               '2015-05-11', '2015-05-12', '2015-05-13', '2015-05-14',
               '2015-05-15', '2015-05-16', '2015-05-17', '2015-05-18',
               '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22',
               '2015-05-23', '2015-05-24', '2015-05-25', '2015-05-26',
      

In [29]:
#前20天
pd.date_range(end='6/1/2012',periods=20)

DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16',
               '2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20',
               '2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24',
               '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28',
               '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')

In [33]:
#可以传入“BM”频率 （表示 business end of month）
pd.date_range('1/1/2015','4/1/2015',freq='BM')

DatetimeIndex(['2015-01-30', '2015-02-27', '2015-03-31'], dtype='datetime64[ns]', freq='BM')

In [36]:
#date_range 默认会保留起始和结束时间戳的时间信息（如果有的话）：
pd.date_range('5/2/2015 12:56:31',periods=5)

DatetimeIndex(['2015-05-02 12:56:31', '2015-05-03 12:56:31',
               '2015-05-04 12:56:31', '2015-05-05 12:56:31',
               '2015-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

有时，虽然起始和结束日期带有时间信息，但你希望产生一组被规范化（normalize）到午夜的时间戳。normalize 选项即可实现该功能

In [39]:
pd.date_range('5/2/2015 12:56:31',periods=5,  normalize = True)

DatetimeIndex(['2015-05-02', '2015-05-03', '2015-05-04', '2015-05-05',
               '2015-05-06'],
              dtype='datetime64[ns]', freq='D')

## 频率和日期偏移量

pandas 中的频率是由一个基础频率（base frequency）和一个乘数组成。基础频率通常以一个字符串别名表示，比如“M”表示每月，“H”表示每小时。对于每个基础频率，都有一个被称为日期偏移量（date offset）的对象与之对应 

In [43]:
##按小时计算的频率可以用 Hour 类表示
from pandas.tseries.offsets import Hour, Minute
hour = Hour()
hour

<Hour>

In [46]:
## 传入一个整数即可定义偏移量的倍数
four_hours = Hour(4)
four_hours

<4 * Hours>

In [48]:
# 无需显示创建，只需用“H” 或“4H” 即可

#在基础频率前面放上一个整数即可创建倍数

pd.date_range('1/1/2015','1/3/2015 23:59',freq='4h')

DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 04:00:00',
               '2015-01-01 08:00:00', '2015-01-01 12:00:00',
               '2015-01-01 16:00:00', '2015-01-01 20:00:00',
               '2015-01-02 00:00:00', '2015-01-02 04:00:00',
               '2015-01-02 08:00:00', '2015-01-02 12:00:00',
               '2015-01-02 16:00:00', '2015-01-02 20:00:00',
               '2015-01-03 00:00:00', '2015-01-03 04:00:00',
               '2015-01-03 08:00:00', '2015-01-03 12:00:00',
               '2015-01-03 16:00:00', '2015-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [50]:
# 大部分偏移量对象都可以通过加法进行连接

Hour(2)+Minute(30)

<150 * Minutes>

时间序列的基础频率：

别名 |	偏移量类型	| 说明 
-|-|-
D	| Day	| 每日历日
B	| BusinessDay	| 每工作日
H	| Hour |	每小时
T 或 min | 	Minute |	每分
S | 	Second	| 每小时
L 或 ms	| Milli	| 每毫秒（即每千分之一秒）
U |	Micro	| 每微妙
M	| MonthEnd	| 每月最后一个工作日
BM	| BusinessMonthEnd	| 每月最后一个工作日
MS	| MonthBegin	| 每月第一个日历日
BMS	| BusinessMonthBegin	| 每月第一个工作日
W-MON、W-TUE...	| Week	| 从指定的星期几（MON、TUE、WED、THU、FRI、SAT、SUN）开始算起，每周
WOM-1MON、WOM-2MON...	| WeekOfMonth	| 产生每月第一、第二、第三或第四周的星期几

In [53]:
#WOM(Week Of Month)是一种非常实用的频率类，它以 WOM 开头。
#每月第三个星期五
rng = pd.date_range('1/1/2015','8/1/2015',
     freq = 'WOM-3FRI')

list(rng)

[Timestamp('2015-01-16 00:00:00', freq='WOM-3FRI'),
 Timestamp('2015-02-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2015-03-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2015-04-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2015-05-15 00:00:00', freq='WOM-3FRI'),
 Timestamp('2015-06-19 00:00:00', freq='WOM-3FRI'),
 Timestamp('2015-07-17 00:00:00', freq='WOM-3FRI')]

## 移动（超前和滞后）数据
移动（shifting）指的是沿着时间轴将数据前移或后移。Series 和 DataFrame 都有一个 shift 方法用于执行单纯的前移或后移操作，保存索引不变：

In [56]:
ts = Series(np.random.randn(4),
   index = pd.date_range('1/1/2015',periods=4,freq='M'))
ts

2015-01-31    2.313713
2015-02-28   -0.258610
2015-03-31    1.117984
2015-04-30   -0.992700
Freq: M, dtype: float64

In [59]:
#往后
ts.shift(2)

2015-01-31         NaN
2015-02-28         NaN
2015-03-31    2.313713
2015-04-30   -0.258610
Freq: M, dtype: float64

In [61]:
#向前
ts.shift(-2)


2015-01-31    1.117984
2015-02-28   -0.992700
2015-03-31         NaN
2015-04-30         NaN
Freq: M, dtype: float64

In [63]:
#　shift 通常用于计算一个时间序列或多个时间序列（如 DataFrame 的列）中的百分比变化。可以这样表达：

ts / ts.shift(1) - 1

2015-01-31         NaN
2015-02-28   -1.111773
2015-03-31   -5.323047
2015-04-30   -1.887937
Freq: M, dtype: float64

In [70]:
#时间移动
ts.shift(2,freq='M')

2015-03-31    2.313713
2015-04-30   -0.258610
2015-05-31    1.117984
2015-06-30   -0.992700
Freq: M, dtype: float64

In [72]:
ts.shift(3,freq='D')

2015-02-03    2.313713
2015-03-03   -0.258610
2015-04-03    1.117984
2015-05-03   -0.992700
dtype: float64

In [74]:
ts.shift(1, freq='3D')

2015-02-03    2.313713
2015-03-03   -0.258610
2015-04-03    1.117984
2015-05-03   -0.992700
dtype: float64

In [76]:
#90分钟
ts.shift(1,freq='90T')

2015-01-31 01:30:00    2.313713
2015-02-28 01:30:00   -0.258610
2015-03-31 01:30:00    1.117984
2015-04-30 01:30:00   -0.992700
Freq: M, dtype: float64

pandas 的日期偏移量还可以用在 datetime 或 Timestamp 对象上

In [79]:
from pandas.tseries.offsets import Day,MonthEnd
now = datetime(2015,8,1)

In [81]:
now + 3*Day()

Timestamp('2015-08-04 00:00:00')

In [89]:
#月末
now + MonthEnd()

Timestamp('2015-08-31 00:00:00')

In [85]:
# 通过rollforward和rollback方法，可以显式地将日期向前向后滚动

In [87]:
offset = MonthEnd()

In [90]:
offset.rollforward(now)

Timestamp('2015-08-31 00:00:00')

In [92]:
offset.rollback(now)

Timestamp('2015-07-31 00:00:00')

In [94]:
# 日期偏移量还有一个巧妙的用法，即结合 groupby 使用两个“滚动”方法

ts = Series(np.random.randn(20),
  index=pd.date_range('1/15/2015',periods=20,freq='4d'))

ts.groupby(offset.rollforward).mean()

2015-01-31   -0.249989
2015-02-28    0.018464
2015-03-31    0.256275
2015-04-30   -0.441662
dtype: float64

In [95]:
# 更快更简单的办法是 resample
ts.resample('M',how='mean')

the new syntax is .resample(...).mean()
  


2015-01-31   -0.249989
2015-02-28    0.018464
2015-03-31    0.256275
2015-04-30   -0.441662
Freq: M, dtype: float64