**Resampling** refers to the process of converting a **time series** from **one frequency to another**.
* 重取樣（resampling）指的是將時間序列從一個頻率轉換到另一個頻率的處理過程。將高頻率資料聚合到低頻率稱為降取樣（downsampling），而將低頻率資料轉換到高頻率則稱為升取樣（upsampling）。 

> **resample**(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None

> rule : string the offset string or object representing target conversion

In [1]:
import numpy as np
import pandas as pd

In [2]:
rng = pd.date_range('2000-01-01', periods=100, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts[30:65]


2000-01-31    2.149271
2000-02-01   -0.405388
2000-02-02   -1.015168
2000-02-03   -0.980988
2000-02-04   -1.520588
2000-02-05   -0.912829
2000-02-06   -0.832280
2000-02-07    0.174565
2000-02-08    1.103264
2000-02-09    1.222306
2000-02-10   -0.321265
2000-02-11    1.259338
2000-02-12   -0.208210
2000-02-13   -0.340111
2000-02-14   -0.567574
2000-02-15    0.662511
2000-02-16   -0.830227
2000-02-17    0.603290
2000-02-18   -0.189446
2000-02-19   -0.351622
2000-02-20    0.602473
2000-02-21    2.417099
2000-02-22   -0.053327
2000-02-23   -0.621101
2000-02-24    2.109857
2000-02-25   -1.388519
2000-02-26    0.336166
2000-02-27    1.056932
2000-02-28   -0.887027
2000-02-29   -1.381348
2000-03-01   -0.959669
2000-03-02    0.219749
2000-03-03    0.148115
2000-03-04   -0.262271
2000-03-05    1.653134
Freq: D, dtype: float64

In [3]:
ts.resample('M').mean()


2000-01-31    0.110116
2000-02-29   -0.043421
2000-03-31   -0.008809
2000-04-30    0.144604
Freq: M, dtype: float64

In [4]:
ts.resample('M', kind='period').mean()

2000-01    0.560487
2000-02   -0.106448
2000-03    0.096216
2000-04    0.316283
Freq: M, dtype: float64

### Downsampling

考慮因素：
* 各區間哪邊是閉合的（參數：closed）
* 如何標記各聚合面元，用區間的開頭還是末尾（參數：label）

In [6]:
rng = pd.date_range('2018-08-03', periods=12, freq='T')
ts = pd.Series(np.arange(12), index=rng)
ts

2018-08-03 00:00:00     0
2018-08-03 00:01:00     1
2018-08-03 00:02:00     2
2018-08-03 00:03:00     3
2018-08-03 00:04:00     4
2018-08-03 00:05:00     5
2018-08-03 00:06:00     6
2018-08-03 00:07:00     7
2018-08-03 00:08:00     8
2018-08-03 00:09:00     9
2018-08-03 00:10:00    10
2018-08-03 00:11:00    11
Freq: T, dtype: int32

Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for ‘M’, ‘A’, ‘Q’, ‘BM’, ‘BA’, ‘BQ’, and ‘W’ which all have a default of ‘right’.

In [6]:
#See Figure 11-3 for an illustration of minute frequency data being resampled to fiveminute frequency.


In [7]:
'''默認使用左標籤（label=’left’），左閉合（closed=’left’）
此時第一個區間爲：2018-08-03 00:00:00~2018-08-03 00:04:59，故sum爲10，label爲：2018-08-03 00:00:00'''
ts.resample('5min').sum()

2018-08-03 00:00:00    10
2018-08-03 00:05:00    35
2018-08-03 00:10:00    21
Freq: 5T, dtype: int32

In [8]:
'''可以指定爲右閉合（closed=’right’），默認使用左標籤（label=’left’）
此時第一個區間爲：2018-08-02 23:55:01~2018-08-03 00:00:00，故sum爲0，label爲：2018-08-02 23:55:00'''
ts.resample('5min', closed='right').sum()

2018-08-02 23:55:00     0
2018-08-03 00:00:00    15
2018-08-03 00:05:00    40
2018-08-03 00:10:00    11
Freq: 5T, dtype: int32

In [25]:
'''可以指定爲右閉合（closed=’right’），右標籤（label=’right’）
此時第一個區間爲：2018-08-02 23:55:01~2018-08-03 00:00:00，故sum爲0，label爲：2018-08-03 00:00:00'''
ts.resample('5min', closed='right', label='right').sum()

2000-01-01 00:00:00     0
2000-01-01 00:05:00    15
2000-01-01 00:10:00    40
2000-01-01 00:15:00    11
Freq: 5T, dtype: int64

### Upsampling and Interpolation

In [9]:
frame = pd.DataFrame(np.random.randn(2, 4),
                     index=pd.date_range('1/1/2019', periods=2,
                                         freq='W-WED'),
                     columns=['Colorado', 'Texas', 'New York', 'Ohio'])
frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2019-01-02,0.677029,-0.823687,0.831188,-0.060464
2019-01-09,0.289899,-1.510092,1.03217,0.893482


**DataFrame.asfreq**(freq, method=None, how=None, normalize=False, fill_value=None)
Convert TimeSeries to specified frequency.

Optionally provide filling method to pad/backfill missing values.

Returns the original data conformed to a new index with the specified frequency. resample is more appropriate if an operation, such as summarization, is necessary to represent the data at the new frequency.



In [21]:
#當我們對這個數據進行聚合的的時候，每個組只有一個值，以及gap（間隔）之間的缺失值。在不使用任何聚合函數的情況下，我們使用asfreq方法將其轉換爲高頻度：
df_daily = frame.resample('D').asfreq()
df_daily

Unnamed: 0,Colorado,Texas,New York,Ohio
2019-01-02,0.010861,0.803812,0.656674,0.132536
2019-01-03,,,,
2019-01-04,,,,
2019-01-05,,,,
2019-01-06,,,,
2019-01-07,,,,
2019-01-08,,,,
2019-01-09,-0.383513,-0.734336,-0.991644,-0.153661


In [10]:
#使用ffill()進行填充
frame.resample('D').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2019-01-02,0.677029,-0.823687,0.831188,-0.060464
2019-01-03,0.677029,-0.823687,0.831188,-0.060464
2019-01-04,0.677029,-0.823687,0.831188,-0.060464
2019-01-05,0.677029,-0.823687,0.831188,-0.060464
2019-01-06,0.677029,-0.823687,0.831188,-0.060464
2019-01-07,0.677029,-0.823687,0.831188,-0.060464
2019-01-08,0.677029,-0.823687,0.831188,-0.060464
2019-01-09,0.289899,-1.510092,1.03217,0.893482


In [11]:
frame.resample('D').ffill(limit=2)

Unnamed: 0,Colorado,Texas,New York,Ohio
2019-01-02,0.677029,-0.823687,0.831188,-0.060464
2019-01-03,0.677029,-0.823687,0.831188,-0.060464
2019-01-04,0.677029,-0.823687,0.831188,-0.060464
2019-01-05,,,,
2019-01-06,,,,
2019-01-07,,,,
2019-01-08,,,,
2019-01-09,0.289899,-1.510092,1.03217,0.893482


In [13]:
frame.resample('W-THU').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2019-01-03,-0.0725,0.471344,2.058983,-0.218399
2019-01-10,-1.314942,0.192264,-1.508524,1.370278


### Resampling with Periods

DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None)

In [14]:
frame = pd.DataFrame(np.random.randn(24, 4),
                     index=pd.period_range('1-2000', '12-2001',
                                           freq='M'),
                     columns=['Colorado', 'Texas', 'New York', 'Ohio'])
frame[:5]


Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01,0.836243,-0.857108,-0.588585,1.785039
2000-02,0.946621,0.382012,1.135344,1.057585
2000-03,2.236473,-1.050085,0.82562,0.76625
2000-04,-0.164278,1.389776,-1.233399,-0.244984
2000-05,0.330096,0.030558,1.049785,1.94893


In [15]:
annual_frame = frame.resample('A-DEC').mean()
annual_frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000,0.420986,0.135564,-0.134891,0.474439
2001,-0.189044,-0.267136,-0.410124,-0.057843


In [16]:
# Q-DEC: Quarterly, year ending in December
annual_frame.resample('Q-DEC').ffill()


Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q1,0.420986,0.135564,-0.134891,0.474439
2000Q2,0.420986,0.135564,-0.134891,0.474439
2000Q3,0.420986,0.135564,-0.134891,0.474439
2000Q4,0.420986,0.135564,-0.134891,0.474439
2001Q1,-0.189044,-0.267136,-0.410124,-0.057843
2001Q2,-0.189044,-0.267136,-0.410124,-0.057843
2001Q3,-0.189044,-0.267136,-0.410124,-0.057843
2001Q4,-0.189044,-0.267136,-0.410124,-0.057843


In [17]:
annual_frame.resample('Q-DEC', convention='end').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q4,0.420986,0.135564,-0.134891,0.474439
2001Q1,0.420986,0.135564,-0.134891,0.474439
2001Q2,0.420986,0.135564,-0.134891,0.474439
2001Q3,0.420986,0.135564,-0.134891,0.474439
2001Q4,-0.189044,-0.267136,-0.410124,-0.057843
