### 2.Resampling

* pd.DataFrame.resample()
* pd.Series.resample(rule,
                     axis = 0,
                     label = "left",
                     closed = "left",
                     loffet)
    - 下采样
        - `.resample().<func>()`
            - `.resample().mean()`
            - `.resample().min()`
            - `.resample().max()`
            - `.resample().median()`
    - `.resample().apply(<func>)`
    - 上采样
        - `.resample().asfreq()`
        - `.resample().ffill()`
        - `.resample().bfill()`
        

#### 2.1 下采样

In [3]:
import pandas as pd
import numpy as np
from datetime import datetime
rng = pd.date_range("1/1/2000", periods = 9, freq = "T")
ts = pd.Series(np.arange(9), index = rng)
print(ts)
print("-" * 25)
print("以bin的左label做lable, 左label之后3分钟的和, 包括label的值: ")
print("-" * 25)
print(ts.resample("3T").sum())
print("-" * 25)
print("以bin的右label做label, 右label之前3分钟的和, 不包括label的值: ")
print("-" * 25)
print(ts.resample("3T", label = "right").sum())
print("-" * 25)
print("以bin的右label做label, 右label之前3分钟的和, 不包括label的值(每个bin label之前3分钟的和): ")
print("-" * 25)
print(ts.resample("3T", label = "right", closed = "right").sum())

print("-" * 25)
print("自定义函数采样:")
print("-" * 25)
def custom_resampler(array_like):
    return np.sum(array_like) + 5
print(ts.resample("3T").apply(custom_resampler))

2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int32
-------------------------
以bin的左label做lable, 左label之后3分钟的和, 包括label的值: 
-------------------------
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int32
-------------------------
以bin的右label做label, 右label之前3分钟的和, 不包括label的值: 
-------------------------
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3T, dtype: int32
-------------------------
以bin的右label做label, 右label之前3分钟的和, 不包括label的值(每个bin label之前3分钟的和): 
-------------------------
2000-01-01 00:00:00     0
2000-01-01 00:03:00     6
2000-01-01 00:06:00    15
2000-01-01 00:09:00    15
Freq: 3T, dtype: int32
-------------------------
自定义函数采样:
-------------------------
2000-01-01 00:00:00     8
2000-01-01 00:03:00  

In [40]:
df = pd.DataFrame({
    "price": [10, 11, 9, 13, 14, 18, 17, 19],
    "volume": [50, 60, 40, 100, 50, 100, 40, 50],
    "week_starting": pd.date_range("01/01/2018", periods = 8, freq = "W")
})
print(df)
print(df.resample("M", on = "week_starting").mean())

   price  volume week_starting
0     10      50    2018-01-07
1     11      60    2018-01-14
2      9      40    2018-01-21
3     13     100    2018-01-28
4     14      50    2018-02-04
5     18     100    2018-02-11
6     17      40    2018-02-18
7     19      50    2018-02-25
               price  volume
week_starting               
2018-01-31     10.75    62.5
2018-02-28     17.00    60.0


In [45]:
days = pd.date_range("1/1/2000", periods = 4, freq = "D")
d2 = dict({
    "price": [10, 11, 9, 13, 14, 18, 17, 19],
    "volumn": [50, 60, 40, 100, 50, 100, 40, 50]
})
df = pd.DataFrame(d2, index = pd.MultiIndex.from_product([days, ["morning", "afternoon"]]))
print(df)
print()
print(df.resample("D", level = 0).sum())

                      price  volumn
2000-01-01 morning       10      50
           afternoon     11      60
2000-01-02 morning        9      40
           afternoon     13     100
2000-01-03 morning       14      50
           afternoon     18     100
2000-01-04 morning       17      40
           afternoon     19      50

            price  volumn
2000-01-01     21     110
2000-01-02     22     140
2000-01-03     32     150
2000-01-04     36      90


#### 2.2 上采样

In [32]:
print(ts)
print()
print(ts.resample("30S").asfreq())
print()
print(ts.resample("30S").ffill())
print()
print(ts.resample("30S").bfill())

2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int32

2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    1.0
2000-01-01 00:01:30    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    NaN
2000-01-01 00:03:00    3.0
2000-01-01 00:03:30    NaN
2000-01-01 00:04:00    4.0
2000-01-01 00:04:30    NaN
2000-01-01 00:05:00    5.0
2000-01-01 00:05:30    NaN
2000-01-01 00:06:00    6.0
2000-01-01 00:06:30    NaN
2000-01-01 00:07:00    7.0
2000-01-01 00:07:30    NaN
2000-01-01 00:08:00    8.0
Freq: 30S, dtype: float64

2000-01-01 00:00:00    0
2000-01-01 00:00:30    0
2000-01-01 00:01:00    1
2000-01-01 00:01:30    1
2000-01-01 00:02:00    2
2000-01-01 00:02:30    2
2000-01-01 00:03:00    3
2000-01-01 00:03:30    3
2000-01-01 00:04:00    4
2000-01-01 00:04:30    4
2000-01-01 00:05

In [37]:
s = pd.Series([1, 2], index = pd.period_range("2012-01-01", freq = "A", periods = 2))
print(s)
print()
print(s.resample("Q", convention = "start").asfreq())
print()
print(s.resample("Q", convention = "end").asfreq())

2012    1
2013    2
Freq: A-DEC, dtype: int64

2012Q1    1.0
2012Q2    NaN
2012Q3    NaN
2012Q4    NaN
2013Q1    2.0
2013Q2    NaN
2013Q3    NaN
2013Q4    NaN
Freq: Q-DEC, dtype: float64

2012Q4    1.0
2013Q1    NaN
2013Q2    NaN
2013Q3    NaN
2013Q4    2.0
Freq: Q-DEC, dtype: float64


#### 2.3 稀疏重采样

In [50]:
rng = pd.date_range("2014-1-1", 
                    periods = 100, 
                    freq = "D") + \
    pd.Timedelta("1s")
ts = pd.Series(range(100), index = rng)
print(ts)
print()
print(ts.resample("3T").sum())
print()

from functools import partial
from pandas.tseries.frequencies import to_offset
def round(t, freq):
    freq = to_offset(freq)
    return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)
print(ts.groupby(partial(round, freq = "3T")).sum())

2014-01-01 00:00:01     0
2014-01-02 00:00:01     1
2014-01-03 00:00:01     2
2014-01-04 00:00:01     3
2014-01-05 00:00:01     4
2014-01-06 00:00:01     5
2014-01-07 00:00:01     6
2014-01-08 00:00:01     7
2014-01-09 00:00:01     8
2014-01-10 00:00:01     9
2014-01-11 00:00:01    10
2014-01-12 00:00:01    11
2014-01-13 00:00:01    12
2014-01-14 00:00:01    13
2014-01-15 00:00:01    14
2014-01-16 00:00:01    15
2014-01-17 00:00:01    16
2014-01-18 00:00:01    17
2014-01-19 00:00:01    18
2014-01-20 00:00:01    19
2014-01-21 00:00:01    20
2014-01-22 00:00:01    21
2014-01-23 00:00:01    22
2014-01-24 00:00:01    23
2014-01-25 00:00:01    24
2014-01-26 00:00:01    25
2014-01-27 00:00:01    26
2014-01-28 00:00:01    27
2014-01-29 00:00:01    28
2014-01-30 00:00:01    29
                       ..
2014-03-12 00:00:01    70
2014-03-13 00:00:01    71
2014-03-14 00:00:01    72
2014-03-15 00:00:01    73
2014-03-16 00:00:01    74
2014-03-17 00:00:01    75
2014-03-18 00:00:01    76
2014-03-19 0

#### 2.4 聚合

In [62]:
df = pd.DataFrame(np.random.randn(1000, 3),
                  index = pd.date_range("1/1/2012", freq = "S", periods = 1000),
                  columns = ["A", "B", "C"])
print(df)
print()
print(df.resample("3T").mean())
print()
print(df.resample("3T")[["A"]].mean())
print()
print(df.resample("3T")[["A", "B"]].mean())
print()
print(df.resample("3T")[["A"]].agg([np.sum, np.mean, np.std]))
print()
print(df.resample("3T").agg([np.mean, np.std]))
print()
print(df.resample("3T").agg({
    "A": np.sum,
    "B": lambda x: np.std(x, ddof = 1)
}))
print()
print(df.resample("3T").agg({
    "A": "sum",
    "B": "std"
}))
print()
print(df.resample("3T").agg({
    "A": ["sum", "std"],
    "B": ["mean", "std"]
}))

                            A         B         C
2012-01-01 00:00:00  0.336515 -0.386290  1.717879
2012-01-01 00:00:01 -2.222708 -0.221846  0.887976
2012-01-01 00:00:02  1.471355 -2.353450  0.572891
2012-01-01 00:00:03  0.622124 -1.583003 -1.672274
2012-01-01 00:00:04  0.646684  0.446552 -1.230236
2012-01-01 00:00:05 -1.704984  1.567934 -0.691381
2012-01-01 00:00:06 -0.665108 -1.076047 -1.947010
2012-01-01 00:00:07  0.431403 -0.842100 -1.125320
2012-01-01 00:00:08  1.297766  0.463347 -0.751704
2012-01-01 00:00:09 -1.543089 -0.619325  1.033533
2012-01-01 00:00:10  0.888560 -1.509188 -0.275000
2012-01-01 00:00:11  0.460785  0.330112 -0.267047
2012-01-01 00:00:12 -0.727969 -1.433808 -0.158661
2012-01-01 00:00:13  0.174241  1.225670  0.218813
2012-01-01 00:00:14 -2.083256 -0.417204  0.504554
2012-01-01 00:00:15  0.816620 -0.210799  0.124784
2012-01-01 00:00:16 -1.559316 -0.246902 -0.271187
2012-01-01 00:00:17  0.611023 -0.506362 -1.103645
2012-01-01 00:00:18 -0.384813  0.552665 -0.410265


#### 2.5 迭代组

In [46]:
small = pd.Series(
    range(6),
    index = pd.to_datetime(['2017-01-01T00:00:00',
                            '2017-01-01T00:30:00',
                            '2017-01-01T00:31:00',
                            '2017-01-01T01:00:00',
                            '2017-01-01T03:00:00',
                            '2017-01-01T03:05:00'])
)
print(small)
resampled = small.resample("H")
print()
print(resampled)
print()
for name, group in resampled:
    print("Group: ", name)
    print("-" * 27)
    print(group, end = "\n\n")

2017-01-01 00:00:00    0
2017-01-01 00:30:00    1
2017-01-01 00:31:00    2
2017-01-01 01:00:00    3
2017-01-01 03:00:00    4
2017-01-01 03:05:00    5
dtype: int64

DatetimeIndexResampler [freq=<Hour>, axis=0, closed=left, label=left, convention=start, base=0]

Group:  2017-01-01 00:00:00
---------------------------
2017-01-01 00:00:00    0
2017-01-01 00:30:00    1
2017-01-01 00:31:00    2
dtype: int64

Group:  2017-01-01 01:00:00
---------------------------
2017-01-01 01:00:00    3
dtype: int64

Group:  2017-01-01 02:00:00
---------------------------
Series([], dtype: int64)

Group:  2017-01-01 03:00:00
---------------------------
2017-01-01 03:00:00    4
2017-01-01 03:05:00    5
dtype: int64



### 3.groupby

In [17]:
df_re = pd.DataFrame({
    "A": [1] * 10 + [5] * 10,
    "B": np.arange(20)
})
print(df_re)
print("-" * 25)
print(df_re.groupby("A").rolling(4).B.mean())
print("-" * 25)
print(df_re.groupby("A").expanding().sum())

    A   B
0   1   0
1   1   1
2   1   2
3   1   3
4   1   4
5   1   5
6   1   6
7   1   7
8   1   8
9   1   9
10  5  10
11  5  11
12  5  12
13  5  13
14  5  14
15  5  15
16  5  16
17  5  17
18  5  18
19  5  19
-------------------------
A    
1  0      NaN
   1      NaN
   2      NaN
   3      1.5
   4      2.5
   5      3.5
   6      4.5
   7      5.5
   8      6.5
   9      7.5
5  10     NaN
   11     NaN
   12     NaN
   13    11.5
   14    12.5
   15    13.5
   16    14.5
   17    15.5
   18    16.5
   19    17.5
Name: B, dtype: float64
-------------------------
         A      B
A                
1 0    1.0    0.0
  1    2.0    1.0
  2    3.0    3.0
  3    4.0    6.0
  4    5.0   10.0
  5    6.0   15.0
  6    7.0   21.0
  7    8.0   28.0
  8    9.0   36.0
  9   10.0   45.0
5 10   5.0   10.0
  11  10.0   21.0
  12  15.0   33.0
  13  20.0   46.0
  14  25.0   60.0
  15  30.0   75.0
  16  35.0   91.0
  17  40.0  108.0
  18  45.0  126.0
  19  50.0  145.0


In [24]:
df_re = pd.DataFrame({
    "date": pd.date_range(start = "2016-01-01", periods = 4, freq = "W"),
    "group": [1, 1, 2, 2],
    "val": [5, 6, 7, 8]
}).set_index("date")
print(df_re)
print("-" * 25)
print(df_re.groupby("group").resample("1D").ffill())
print("-" * 25)
print(df_re.groupby("group").resample("1D").bfill())

            group  val
date                  
2016-01-03      1    5
2016-01-10      1    6
2016-01-17      2    7
2016-01-24      2    8
-------------------------
                  group  val
group date                  
1     2016-01-03      1    5
      2016-01-04      1    5
      2016-01-05      1    5
      2016-01-06      1    5
      2016-01-07      1    5
      2016-01-08      1    5
      2016-01-09      1    5
      2016-01-10      1    6
2     2016-01-17      2    7
      2016-01-18      2    7
      2016-01-19      2    7
      2016-01-20      2    7
      2016-01-21      2    7
      2016-01-22      2    7
      2016-01-23      2    7
      2016-01-24      2    8
-------------------------
                  group  val
group date                  
1     2016-01-03      1    5
      2016-01-04      1    6
      2016-01-05      1    6
      2016-01-06      1    6
      2016-01-07      1    6
      2016-01-08      1    6
      2016-01-09      1    6
      2016-01-10      1    

### 4.

* warnings
    - warn()
    - `filterwarnings()`
        - action = "error"
        - action = "ignore"
        - action = "always"
        - action = "default"
        - action = "module"
        - action = "once"
    - resetwarnings()
    - showwarning()
    - formatwarning()


#### 4.1 [Question](https://stackoverflow.com/questions/15297053/how-can-i-divide-single-values-of-a-dataframe-by-monthly-averages)

In [37]:
import numpy as np
import warnings
warnings.filterwarnings("ignore")

np.random.seed(93)
rng = pd.date_range(start = "2014-01-01", end = "2017-01-01", freq = "15T")
ts = pd.Series(np.random.randint(0, 2000, len(rng)), index = rng)

print(ts)
print(len(ts["2014-01-01"]))   # 96
print(ts["2014-01-01"].mean()) # 1056.2916666666667

print("-" * 50)
grouper = pd.TimeGrouper("1M")
ts["normed"] = ts.groupby(grouper).transform(lambda x: x / x.mean())
print(ts)

2014-01-01 00:00:00     421
2014-01-01 00:15:00    1371
2014-01-01 00:30:00    1412
2014-01-01 00:45:00    1176
2014-01-01 01:00:00     207
2014-01-01 01:15:00    1633
2014-01-01 01:30:00    1109
2014-01-01 01:45:00     560
2014-01-01 02:00:00     845
2014-01-01 02:15:00     237
2014-01-01 02:30:00     436
2014-01-01 02:45:00    1716
2014-01-01 03:00:00    1017
2014-01-01 03:15:00    1203
2014-01-01 03:30:00     917
2014-01-01 03:45:00    1073
2014-01-01 04:00:00    1866
2014-01-01 04:15:00     929
2014-01-01 04:30:00     520
2014-01-01 04:45:00     583
2014-01-01 05:00:00     464
2014-01-01 05:15:00    1240
2014-01-01 05:30:00     521
2014-01-01 05:45:00    1847
2014-01-01 06:00:00    1790
2014-01-01 06:15:00    1419
2014-01-01 06:30:00    1797
2014-01-01 06:45:00     428
2014-01-01 07:00:00     811
2014-01-01 07:15:00    1279
                       ... 
2016-12-31 16:45:00    1550
2016-12-31 17:00:00    1327
2016-12-31 17:15:00     724
2016-12-31 17:30:00     525
2016-12-31 17:45:00 