# [Pandas时序和日期功能](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects)

- pandas扩展了numpy的datetime64和datedelta64功能，融合了第三方包的功能，提供了丰富的时序和日期功能。



In [12]:
import datetime
import numpy as np
import pandas as pd

## 示例 

### 解析不同格式对象
> Parsing time series information from various sources and formats

In [13]:
dti = pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01'), datetime.datetime(2018, 1, 1)])
dti

DatetimeIndex(['2018-01-01', '2018-01-01', '2018-01-01'], dtype='datetime64[ns]', freq=None)

### pd.date_range生成以小时为频率的时间序列
> Generate sequences of fixed-frequency dates and time spans

In [38]:
dti = pd.date_range('2018-01-01', periods=3, freq='H')
dti

DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',
               '2018-01-01 02:00:00'],
              dtype='datetime64[ns]', freq='H')

### 操作和转换时区信息
> Manipulating and converting date times with timezone information

In [39]:
dti = dti.tz_localize('UTC')
dti

DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 01:00:00+00:00',
               '2018-01-01 02:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='H')

In [40]:
dti.tz_convert('US/Pacific')
dti

DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 01:00:00+00:00',
               '2018-01-01 02:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='H')

### 重采样或转换为特定频率的时序
> Resampling or converting a time series to a particular frequency

In [44]:
idx = pd.date_range('2018-01-01', periods=5, freq='H')
idx

DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',
               '2018-01-01 02:00:00', '2018-01-01 03:00:00',
               '2018-01-01 04:00:00'],
              dtype='datetime64[ns]', freq='H')

In [45]:
ts = pd.Series(range(len(idx)), index=idx) # 将idx作为指标
ts

2018-01-01 00:00:00    0
2018-01-01 01:00:00    1
2018-01-01 02:00:00    2
2018-01-01 03:00:00    3
2018-01-01 04:00:00    4
Freq: H, dtype: int64

In [47]:
ts.resample('2H').mean() # 对ts做每隔2个小时的采样

2018-01-01 00:00:00    0.5
2018-01-01 02:00:00    2.5
2018-01-01 04:00:00    4.0
Freq: 2H, dtype: float64

### 用绝对和相对时间增量做日期和时间算术
> Performing date and time arithmetic with absolute or relative time increments

In [51]:
friday = pd.Timestamp('2018-01-05') # 定义一个时间点
friday.day_name()

'Friday'

In [52]:
saturday = friday + pd.Timedelta('1 day')
saturday.day_name()

'Saturday'

In [55]:
monday = friday + pd.offsets.BDay()
monday.day_name()

'Monday'

# [综述](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamps-vs-time-spans)

- Date times(时间点)：特定日期和时间，支持时区. 主要生成方法：`to_datetime`和`date_range`。
- Time deltas(时间增量)：一个绝对时长。主要生成方法：`to_timedelta`和`timedelta_range`。
- Time spans(时段)：一个特定的时间跨度，由时间点和频率定义。主要生成方法：`Period`和`period_range`.
- Date offsets(时偏)：一个相对时长，基于日期算术。主要生成方法：`DateOffset`。

对于时序数据，按照惯例在Series和Dataframe的索引中表达时间项，这样可以相对时间元素进行操作。

In [58]:
pd.Series(range(3), index=pd.date_range('2000', freq='D', periods=3))

2000-01-01    0
2000-01-02    1
2000-01-03    2
Freq: D, dtype: int64

Series和Dataframe扩展了对`datatime`, `timedelta`和`Period`数据类型支持和相应的功能。`Dataoffset`数据作为`object`数据存储。

In [60]:
pd.Series(pd.period_arange('1/1/2011', freq='M', periods=3))

0    2011-01
1    2011-02
2    2011-03
dtype: period[M]

In [61]:
pd.Series([pd.DateOffset(1), pd.DateOffset(2)])

0         <DateOffset>
1    <2 * DateOffsets>
dtype: object

In [63]:
pd.Series(pd.date_range('1/1/2011', freq='M', periods=3))

0   2011-01-31
1   2011-02-28
2   2011-03-31
dtype: datetime64[ns]

pandas用`Nat`表示空的时刻, 时间增量, 和时段, 代表失效日期或空日期，类似于`np.na`用于浮点数据。

In [64]:
 pd.Timestamp(pd.NaT)

NaT

In [65]:
pd.Timedelta(pd.NaT)

NaT

In [66]:
pd.Period(pd.NaT)

NaT

In [67]:
pd.NaT == pd.NaT  # 注意，在比较规则上和 np.nan 是一样的

False

# [时戳 vs. 时段](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamps-vs-time-spans)

时戳数据是时序数据的基本数型，将时间点与值关联起来。对于pandas对象，它表示时间上的点。
> Timestamped data is the most basic type of time series data that associates values with points in time. For pandas objects it means using the points in time.

In [72]:
pd.Timestamp(datetime.datetime(2012, 5, 1))

Timestamp('2012-05-01 00:00:00')

In [73]:
pd.Timestamp('2012-05-01')

Timestamp('2012-05-01 00:00:00')

In [74]:
pd.Timestamp(2012, 5, 1)

Timestamp('2012-05-01 00:00:00')

在很多情况下，将变化变量与时间跨度关联起来更为自然。时间跨度由`Period`显式定义，或由`datetime`字符串格式推断。
> However, in many cases it is more natural to associate things like change variables with a time span instead. The span represented by Period can be specified explicitly, or inferred from datetime string format.

In [75]:
 pd.Period('2011-01') # 这是表示2011年1月整一月

Period('2011-01', 'M')

In [76]:
pd.Period('2012-05', freq='D')  # 这是表示2012年5月1号的整一天

Period('2012-05-01', 'D')

`Timestamp`和`Period`可作以索引。`Timestamp`和`Period`的列表可分别自动强制转化为`DatetimeIndex`和`PeriodIndex`。

In [79]:
dates = [pd.Timestamp('2012-05-01'), pd.Timestamp('2012-05-02'), pd.Timestamp('2012-05-03')] # 定义一个时戳列表
dates

[Timestamp('2012-05-01 00:00:00'),
 Timestamp('2012-05-02 00:00:00'),
 Timestamp('2012-05-03 00:00:00')]

In [80]:
ts = pd.Series(np.random.randn(3), dates) # 作为索引传给序列构造函数

In [81]:
ts.index

DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

In [82]:
ts

2012-05-01   -0.324026
2012-05-02   -1.674873
2012-05-03   -0.885818
dtype: float64

In [85]:
periods = [pd.Period('2012-01'), pd.Period('2012-02'), pd.Period('2012-03')]
ts = pd.Series(np.random.randn(3), periods)

In [86]:
type(ts.index)

pandas.core.indexes.period.PeriodIndex

In [87]:
ts.index

PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')

In [88]:
ts

2012-01    0.023036
2012-02    0.407289
2012-03   -0.986179
Freq: M, dtype: float64

# [转换为时戳](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#converting-to-timestamps)

用`to_datetime`函数将类似日期的对象，如字符串、纪元、或混合体的列表或Series转换为`日期时间索引(DateitmeIndex)`。当传递的是`Series`，则返回的是具有相同索引的`Series`。如果传递的是列表，则转换为一个`日期时间索引`。

In [89]:
 pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None])) # 传入Series，返回Series

0   2009-07-31
1   2010-01-10
2          NaT
dtype: datetime64[ns]

In [91]:
pd.to_datetime(['2005/11/23', '2010.12.31']) # 传入列表，返回DatetimeIndex

DatetimeIndex(['2005-11-23', '2010-12-31'], dtype='datetime64[ns]', freq=None)

若传入时日期以日开头(若欧洲样式)，则使用`dayfirst`参数。

In [92]:
pd.to_datetime(['04-01-2012 10:00'], dayfirst=True)  

DatetimeIndex(['2012-01-04 10:00:00'], dtype='datetime64[ns]', freq=None)

In [95]:
pd.to_datetime(['2012-01-14', '01-14-2012'], dayfirst=[False, True]) # 注意dayfirst为列表可对应于值列表

DatetimeIndex(['2012-01-14', '2012-01-14'], dtype='datetime64[ns]', freq=None)

In [96]:
pd.to_datetime('2010/11/12') # 如果传递单独一个字符串，则返回一个时戳

Timestamp('2010-11-12 00:00:00')

In [98]:
pd.Timestamp('2010/11/12')  # 上面操作和定义一个时戳效果一样

Timestamp('2010-11-12 00:00:00')

亦可用`DatetimeIndex`直接构造

In [100]:
pd.DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'])

DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], dtype='datetime64[ns]', freq=None)

用`infer`字符串作为`freq`参数可推断出频率

In [101]:
pd.DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], freq='infer')

DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], dtype='datetime64[ns]', freq='2D')

## [提供格式参数](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#providing-a-format-argument)

除了所需的日期时间字符串，`format`形参可定义解析方式。这样亦可加速传换。关于`format`选项，可见[datetime文档](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)。

In [103]:
pd.to_datetime('2010/11/12', format='%Y/%m/%d')

Timestamp('2010-11-12 00:00:00')

In [105]:
pd.to_datetime('12-11-2010 00:00', format='%d-%m-%Y %H:%M')

Timestamp('2010-11-12 00:00:00')

## [从多个Dataframe列集成日期时间](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#assembling-datetime-from-multiple-dataframe-columns)

In [110]:
df = pd.DataFrame({'year': [2015, 2016],'month': [2, 3],'day': [4, 5],'hour': [2, 3]})
df

Unnamed: 0,year,month,day,hour
0,2015,2,4,2
1,2016,3,5,3


In [109]:
pd.to_datetime(df)

0   2015-02-04 02:00:00
1   2016-03-05 03:00:00
dtype: datetime64[ns]

 可只传递需要集成的列

In [112]:
pd.to_datetime(df[['year', 'month', 'day']])

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

##  [无效数据](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#invalid-data)

In [114]:
pd.to_datetime(['2009/07/31', 'asd'], errors='raise') # 当无法解析时，默认抛出异常 errors='raise'

ValueError: ('Unknown string format:', 'asd')

In [116]:
pd.to_datetime(['2009/07/31', 'asd'], errors='ignore') # 亦可忽略  errors='ignore'

Index(['2009/07/31', 'asd'], dtype='object')

In [117]:
 pd.to_datetime(['2009/07/31', 'asd'], errors='coerce') # 强制转换为 NaT  errors='coerce'

DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)

## [纪元时戳](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#epoch-timestamps)

In [118]:
pd.to_datetime([1349720105, 1349806505, 1349892905,1349979305, 1350065705], unit='s')

DatetimeIndex(['2012-10-08 18:15:05', '2012-10-09 18:15:05',
               '2012-10-10 18:15:05', '2012-10-11 18:15:05',
               '2012-10-12 18:15:05'],
              dtype='datetime64[ns]', freq=None)

In [119]:
pd.to_datetime([1349720105100, 1349720105200, 1349720105300,1349720105400, 1349720105500], unit='ms')

DatetimeIndex(['2012-10-08 18:15:05.100000', '2012-10-08 18:15:05.200000',
               '2012-10-08 18:15:05.300000', '2012-10-08 18:15:05.400000',
               '2012-10-08 18:15:05.500000'],
              dtype='datetime64[ns]', freq=None)

In [120]:
pd.Timestamp(1262347200000000000).tz_localize('US/Pacific')

Timestamp('2010-01-01 12:00:00-0800', tz='US/Pacific')

In [121]:
pd.DatetimeIndex([1262347200000000000]).tz_localize('US/Pacific')

DatetimeIndex(['2010-01-01 12:00:00-08:00'], dtype='datetime64[ns, US/Pacific]', freq=None)

## [时戳转为纪元](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#from-timestamps-to-epoch)

In [122]:
stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='D')

In [123]:
(stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')

Int64Index([1349720105, 1349806505, 1349892905, 1349979305], dtype='int64')

## [使用`origin`参数](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#using-the-origin-parameter)

In [124]:
pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01'))

DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)

In [125]:
pd.to_datetime([1, 2, 3], unit='D')

DatetimeIndex(['1970-01-02', '1970-01-03', '1970-01-04'], dtype='datetime64[ns]', freq=None)

# [生成时戳域](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#generating-ranges-of-timestamps)