<a href="https://colab.research.google.com/github/ykgw-daiki-nakamura/timeseries-resample/blob/main/python_resample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# データ集計における区間とラベル

時系列データの集計において、区間、ラベルというパラメータが使われることがある。
それぞれのパラメータがどのように扱われるのかを本 jupyternotebook にて python を用いて説明してく。

# 集計区間：close(開閉）

`close` パラメータは「どちら側を区間の終端（クローズ）」とするかを指定します。

区間の**どちらの端点を含むか**を決めます。


- `close='right'`
  区間の**右端（終端）**を含む（Right-closed, left-open/右閉左開）。

- `close='left'`
  区間の**左端（始端）**を含む（Left-closed, right-open/右開左閉）。
  
## 例
00:00 から 01:00 の区間のデータがあるとする。

- **right**の場合、区間は `(00:00, 01:00]` で、01:00が含まれる。

- **left**の場合、区間は `[00:00, 01:00)` で、00:00が含まれる。

| データ時刻               | 区間 (`'1H'` 指定)        | `close='right'` | `close='left'`  |
| ------------------- | --------------------- | --------------- | --------------- |
| 2025-01-01 00:00:00 | 00:00～01:00           | ×               | ○               |
| 2025-01-01 01:00:00 | 01:00～02:00           | ○               | ×               |

In [None]:
# 2025/01/01 00:00 から開始する1分ごとの時系列データを value:1 から順番に生成する

import pandas as pd

start_date = pd.to_datetime('2025/01/01 00:00')
freq = 'min'
periods = 600000

time_series_data = pd.DataFrame({
    'value': range(1, periods + 1)
}, index=pd.date_range(start=start_date, periods=periods, freq=freq, tz='UTC'))

print(time_series_data.head())
print(time_series_data.tail())


                           value
2025-01-01 00:00:00+00:00      1
2025-01-01 00:01:00+00:00      2
2025-01-01 00:02:00+00:00      3
2025-01-01 00:03:00+00:00      4
2025-01-01 00:04:00+00:00      5
                            value
2026-02-21 15:55:00+00:00  599996
2026-02-21 15:56:00+00:00  599997
2026-02-21 15:57:00+00:00  599998
2026-02-21 15:58:00+00:00  599999
2026-02-21 15:59:00+00:00  600000


In [None]:
# time_series_data, 30min, sum
# Case1. closed='left', label='left'
# Case2. closed='left', label='right'
# Case3. closed='right', label='left'
# Case4. closed='right', label='right'

# Case1. closed='left', label='left'
resampled_data_case1 = time_series_data.resample('30min', closed='left', label='left').sum()
print("\nCase 1: closed='left', label='left'")
print(resampled_data_case1.head())
print(resampled_data_case1.tail())

# Case2. closed='left', label='right'
resampled_data_case2 = time_series_data.resample('30min', closed='left', label='right').sum()
print("\nCase 2: closed='left', label='right'")
print(resampled_data_case2.head())
print(resampled_data_case2.tail())


# Case3. closed='right', label='left'
resampled_data_case3 = time_series_data.resample('30min', closed='right', label='left').sum()
print("\nCase 3: closed='right', label='left'")
print(resampled_data_case3.head())
print(resampled_data_case3.tail())


# Case4. closed='right', label='right'
resampled_data_case4 = time_series_data.resample('30min', closed='right', label='right').sum()
print("\nCase 4: closed='right', label='right'")
print(resampled_data_case4.head())
print(resampled_data_case4.tail())


Case 1: closed='left', label='left'
                           value
2025-01-01 00:00:00+00:00    465
2025-01-01 00:30:00+00:00   1365
2025-01-01 01:00:00+00:00   2265
2025-01-01 01:30:00+00:00   3165
2025-01-01 02:00:00+00:00   4065
                              value
2026-02-21 13:30:00+00:00  17995965
2026-02-21 14:00:00+00:00  17996865
2026-02-21 14:30:00+00:00  17997765
2026-02-21 15:00:00+00:00  17998665
2026-02-21 15:30:00+00:00  17999565

Case 2: closed='left', label='right'
                           value
2025-01-01 00:30:00+00:00    465
2025-01-01 01:00:00+00:00   1365
2025-01-01 01:30:00+00:00   2265
2025-01-01 02:00:00+00:00   3165
2025-01-01 02:30:00+00:00   4065
                              value
2026-02-21 14:00:00+00:00  17995965
2026-02-21 14:30:00+00:00  17996865
2026-02-21 15:00:00+00:00  17997765
2026-02-21 15:30:00+00:00  17998665
2026-02-21 16:00:00+00:00  17999565

Case 3: closed='right', label='left'
                           value
2024-12-31 23:30:00+00:00 

In [None]:
# time_series_data, day, sum
# Case1. closed='left', label='left'
# Case2. closed='left', label='right'
# Case3. closed='right', label='left'
# Case4. closed='right', label='right'

# 集計条件: day, sum

# Case1. closed='left', label='left'
resampled_data_case1_day = time_series_data.resample('D', closed='left', label='left').sum()
print("\nCase 1 (Day): closed='left', label='left'")
print(resampled_data_case1_day.head())
print(resampled_data_case1_day.tail())

# Case2. closed='left', label='right'
resampled_data_case2_day = time_series_data.resample('D', closed='left', label='right').sum()
print("\nCase 2 (Day): closed='left', label='right'")
print(resampled_data_case2_day.head())
print(resampled_data_case2_day.tail())

# Case3. closed='right', label='left'
resampled_data_case3_day = time_series_data.resample('D', closed='right', label='left').sum()
print("\nCase 3 (Day): closed='right', label='left'")
print(resampled_data_case3_day.head())
print(resampled_data_case3_day.tail())

# Case4. closed='right', label='right'
resampled_data_case4_day = time_series_data.resample('D', closed='right', label='right').sum()
print("\nCase 4 (Day): closed='right', label='right'")
print(resampled_data_case4_day.head())
print(resampled_data_case4_day.tail())


Case 1 (Day): closed='left', label='left'
                             value
2025-01-01 00:00:00+00:00  1037520
2025-01-02 00:00:00+00:00  3111120
2025-01-03 00:00:00+00:00  5184720
2025-01-04 00:00:00+00:00  7258320
2025-01-05 00:00:00+00:00  9331920
                               value
2026-02-17 00:00:00+00:00  855360720
2026-02-18 00:00:00+00:00  857434320
2026-02-19 00:00:00+00:00  859507920
2026-02-20 00:00:00+00:00  861581520
2026-02-21 00:00:00+00:00  575539680

Case 2 (Day): closed='left', label='right'
                             value
2025-01-02 00:00:00+00:00  1037520
2025-01-03 00:00:00+00:00  3111120
2025-01-04 00:00:00+00:00  5184720
2025-01-05 00:00:00+00:00  7258320
2025-01-06 00:00:00+00:00  9331920
                               value
2026-02-18 00:00:00+00:00  855360720
2026-02-19 00:00:00+00:00  857434320
2026-02-20 00:00:00+00:00  859507920
2026-02-21 00:00:00+00:00  861581520
2026-02-22 00:00:00+00:00  575539680

Case 3 (Day): closed='right', label='left'
     

In [None]:
# prompt: JST に変換

time_series_data_jst = time_series_data.tz_convert('Asia/Tokyo')

print(time_series_data_jst.head())
print(time_series_data_jst.tail())


                           value
2025-01-01 09:00:00+09:00      1
2025-01-01 09:01:00+09:00      2
2025-01-01 09:02:00+09:00      3
2025-01-01 09:03:00+09:00      4
2025-01-01 09:04:00+09:00      5
                            value
2026-02-22 00:55:00+09:00  599996
2026-02-22 00:56:00+09:00  599997
2026-02-22 00:57:00+09:00  599998
2026-02-22 00:58:00+09:00  599999
2026-02-22 00:59:00+09:00  600000


In [None]:
# time_series_data, 30min, sum
# Case1. closed='left', label='left'
# Case2. closed='left', label='right'
# Case3. closed='right', label='left'
# Case4. closed='right', label='right'

# Case1. closed='left', label='left'
resampled_data_case1 = time_series_data_jst.resample('30min', closed='left', label='left').sum()
print("\nCase 1: closed='left', label='left'")
print(resampled_data_case1.head())
print(resampled_data_case1.tail())

# Case2. closed='left', label='right'
resampled_data_case2 = time_series_data_jst.resample('30min', closed='left', label='right').sum()
print("\nCase 2: closed='left', label='right'")
print(resampled_data_case2.head())
print(resampled_data_case2.tail())


# Case3. closed='right', label='left'
resampled_data_case3 = time_series_data_jst.resample('30min', closed='right', label='left').sum()
print("\nCase 3: closed='right', label='left'")
print(resampled_data_case3.head())
print(resampled_data_case3.tail())


# Case4. closed='right', label='right'
resampled_data_case4 = time_series_data_jst.resample('30min', closed='right', label='right').sum()
print("\nCase 4: closed='right', label='right'")
print(resampled_data_case4.head())
print(resampled_data_case4.tail())


Case 1: closed='left', label='left'
                           value
2025-01-01 09:00:00+09:00    465
2025-01-01 09:30:00+09:00   1365
2025-01-01 10:00:00+09:00   2265
2025-01-01 10:30:00+09:00   3165
2025-01-01 11:00:00+09:00   4065
                              value
2026-02-21 22:30:00+09:00  17995965
2026-02-21 23:00:00+09:00  17996865
2026-02-21 23:30:00+09:00  17997765
2026-02-22 00:00:00+09:00  17998665
2026-02-22 00:30:00+09:00  17999565

Case 2: closed='left', label='right'
                           value
2025-01-01 09:30:00+09:00    465
2025-01-01 10:00:00+09:00   1365
2025-01-01 10:30:00+09:00   2265
2025-01-01 11:00:00+09:00   3165
2025-01-01 11:30:00+09:00   4065
                              value
2026-02-21 23:00:00+09:00  17995965
2026-02-21 23:30:00+09:00  17996865
2026-02-22 00:00:00+09:00  17997765
2026-02-22 00:30:00+09:00  17998665
2026-02-22 01:00:00+09:00  17999565

Case 3: closed='right', label='left'
                           value
2025-01-01 08:30:00+09:00 

In [None]:
# time_series_data, day, sum
# Case1. closed='left', label='left'
# Case2. closed='left', label='right'
# Case3. closed='right', label='left'
# Case4. closed='right', label='right'

# 集計条件: day, sum

# Case1. closed='left', label='left'
resampled_data_case1_day = time_series_data_jst.resample('D', closed='left', label='left').sum()
print("\nCase 1 (Day): closed='left', label='left'")
print(resampled_data_case1_day.head())
print(resampled_data_case1_day.tail())

# Case2. closed='left', label='right'
resampled_data_case2_day = time_series_data_jst.resample('D', closed='left', label='right').sum()
print("\nCase 2 (Day): closed='left', label='right'")
print(resampled_data_case2_day.head())
print(resampled_data_case2_day.tail())

# Case3. closed='right', label='left'
resampled_data_case3_day = time_series_data_jst.resample('D', closed='right', label='left').sum()
print("\nCase 3 (Day): closed='right', label='left'")
print(resampled_data_case3_day.head())
print(resampled_data_case3_day.tail())

# Case4. closed='right', label='right'
resampled_data_case4_day = time_series_data_jst.resample('D', closed='right', label='right').sum()
print("\nCase 4 (Day): closed='right', label='right'")
print(resampled_data_case4_day.head())
print(resampled_data_case4_day.tail())


Case 1 (Day): closed='left', label='left'
                             value
2025-01-01 00:00:00+09:00   405450
2025-01-02 00:00:00+09:00  2333520
2025-01-03 00:00:00+09:00  4407120
2025-01-04 00:00:00+09:00  6480720
2025-01-05 00:00:00+09:00  8554320
                               value
2026-02-18 00:00:00+09:00  856656720
2026-02-19 00:00:00+09:00  858730320
2026-02-20 00:00:00+09:00  860803920
2026-02-21 00:00:00+09:00  862877520
2026-02-22 00:00:00+09:00   35998230

Case 2 (Day): closed='left', label='right'
                             value
2025-01-02 00:00:00+09:00   405450
2025-01-03 00:00:00+09:00  2333520
2025-01-04 00:00:00+09:00  4407120
2025-01-05 00:00:00+09:00  6480720
2025-01-06 00:00:00+09:00  8554320
                               value
2026-02-19 00:00:00+09:00  856656720
2026-02-20 00:00:00+09:00  858730320
2026-02-21 00:00:00+09:00  860803920
2026-02-22 00:00:00+09:00  862877520
2026-02-23 00:00:00+09:00   35998230

Case 3 (Day): closed='right', label='left'
     

In [None]:
# prompt: # time_series_data, month, sum
# # Case1. closed='left', label='left'
# # Case2. closed='left', label='right'
# # Case3. closed='right', label='left'
# # Case4. closed='right', label='right'

# time_series_data, month, sum

# Case1. closed='left', label='left'
resampled_data_case1_month = time_series_data.resample('M', closed='left', label='left').sum()
print("\nCase 1 (Month): closed='left', label='left'")
print(resampled_data_case1_month.head())
print(resampled_data_case1_month.tail())

# Case2. closed='left', label='right'
resampled_data_case2_month = time_series_data.resample('M', closed='left', label='right').sum()
print("\nCase 2 (Month): closed='left', label='right'")
print(resampled_data_case2_month.head())
print(resampled_data_case2_month.tail())

# Case3. closed='right', label='left'
resampled_data_case3_month = time_series_data.resample('M', closed='right', label='left').sum()
print("\nCase 3 (Month): closed='right', label='left'")
print(resampled_data_case3_month.head())
print(resampled_data_case3_month.tail())

# Case4. closed='right', label='right'
resampled_data_case4_month = time_series_data.resample('M', closed='right', label='right').sum()
print("\nCase 4 (Month): closed='right', label='right'")
print(resampled_data_case4_month.head())
print(resampled_data_case4_month.tail())


Case 1 (Month): closed='left', label='left'
                                value
2024-12-31 00:00:00+00:00   933141600
2025-01-31 00:00:00+00:00  2554695360
2025-02-28 00:00:00+00:00  4724719920
2025-03-31 00:00:00+00:00  6469653600
2025-04-30 00:00:00+00:00  8645897520
                                 value
2025-09-30 00:00:00+00:00  18480982320
2025-10-31 00:00:00+00:00  19782165600
2025-11-30 00:00:00+00:00  22402159920
2025-12-31 00:00:00+00:00  24394889520
2026-01-31 00:00:00+00:00  18233295600

Case 2 (Month): closed='left', label='right'
                                value
2025-01-31 00:00:00+00:00   933141600
2025-02-28 00:00:00+00:00  2554695360
2025-03-31 00:00:00+00:00  4724719920
2025-04-30 00:00:00+00:00  6469653600
2025-05-31 00:00:00+00:00  8645897520
                                 value
2025-10-31 00:00:00+00:00  18480982320
2025-11-30 00:00:00+00:00  19782165600
2025-12-31 00:00:00+00:00  22402159920
2026-01-31 00:00:00+00:00  24394889520
2026-02-28 00:00:00+00:0

  resampled_data_case1_month = time_series_data.resample('M', closed='left', label='left').sum()
  resampled_data_case2_month = time_series_data.resample('M', closed='left', label='right').sum()
  resampled_data_case3_month = time_series_data.resample('M', closed='right', label='left').sum()
  resampled_data_case4_month = time_series_data.resample('M', closed='right', label='right').sum()


#

# プロット位置：ラベル

`label` パラメータは「集計した区間を代表するラベル（時刻）をどちら側につけるか」を指定します。

区間の**どちらの端点を含むか**を決めます。

- `label='right'`
  区間の**右端（終端）**の時刻をラベルとして使う。

- `label='left'`
  区間の**左端（始端）**の時刻をラベルとして使う。

  
## 例
00:00 から 01:00 の区間のデータがあるとする。

| 区間           | `label='right'`  | `label='left'`  |
|----------------|-----------------------|----------------------|
| 00:00～01:00   | 01:00                 | 00:00                |
| 01:00～02:00   | 02:00                 | 01:00                |

# close と label の組み合わせ

| close | label   | 区間                  | ラベル（プロット）の位置 |
|-------|---------|-----------------------|--------------|
| left | left     | [00:00, 01:00) (01:00を含まない)| 00:00        |
| left | right    | [00:00, 01:00) (01:00を含まない)| 01:00        |
| right | left    | (00:00, 01:00] (00:00を含まない)| 00:00        |
| right | right   | (00:00, 01:00] (00:00を含まない)| 01:00        |

