# Data Description
- Author: 方珮潔、黃書佑
- Date first created: 20251026
- Background: 此資料表來自 `extent-of-discount-rate-processed-code.ipynb` 之結果
- Unit of observation: 409
- Sample period: 20230501 ~ 20250831
- Known issues: 
    - 僅考慮有一定規模的遊戲。
    - 無法捕捉非季節性促銷。
- Definition for each variable:
    - `Date`: datetime，當次折扣首日。
    - `GameID`: object，該遊戲 unique 的 ID。
    - `Type`: object，該遊戲的類型。
    - `MultiPlayer`: int，該遊戲是否屬多人遊戲。
    - `Publisher`: object，該遊戲的出版商名稱。
    - `ConstantDiscount`: int，該遊戲的折扣幅度是否固定。（應為 0）
    - `Final price`: float，該遊戲當下價格。
    - `OriginPrice`: float，該遊戲原價。
    - `DiscountRate`: float，當次折扣的幅度，為正表示降價。（應大於 0，表示降價）
    - `DiscountDuration`: 當次折扣天數。
    - `DiscountFreq3M`: 該遊戲過去三個月的折扣頻率。
    - `Age`: float，折扣當下距該遊戲發布多少年。
    - `AccumulatedPositiveRate`: float，該遊戲累計至折扣當下的總評價正面率。
    - `SalePeriod`: int，當次折扣是否位於季節性促銷期間。
    - `PlayerGrowthRate1W`: float，玩家數一週的移動平均增長率。
    - `PlayerGrowthRate2W`:float，玩家數兩週的移動平均增長率。
    - `PlayerGrowthRate1M`: float，玩家數一個月的移動平均增長率。
    - `AvgFollowersGrowthRateDeviationRate1W`:float，追蹤人數近一週平均日增長率偏離長期平均多少。
    - `AvgFollowersGrowthRateDeviationRate2W`:float，追蹤人數近兩週平均日增長率偏離長期平均多少。
    - `AvgFollowersGrowthRateDeviationRate1M`: float，追蹤人數近一月平均日增長率偏離長期平均多少。
    - `PositiveRateGrowthRate1W`:float，評價正面率一週的增長率。
    - `PositiveRateGrowthRate2W`:float，評價正面率兩週的增長率。
    - `PositiveRateGrowthRate1M`:float，評價正面率一個月的增長率。
    - `DLC_since_last_discount`:int，截至上一次折扣結束共推出多少新的 DLC。
    - `Sequel_since_last_discount`:int，截至上一次折扣結束共堆出多少新的續作。

In [21]:
import pandas as pd

## Change for Your Own Data

In [22]:
input_data_file = "/Users/peijiefang/Downloads/data analysis project/extent-of-discount-rate-rate-update-DE.csv"

In [23]:
df = pd.read_csv(input_data_file)

### Sample

In [24]:
df.head(10)

Unnamed: 0,Date,GameID,Type,MultiPlayer,Publisher,ConstantDiscount,Final price,OriginPrice,DiscountRate,DiscountDuration,...,PlayerGrowthRate2W,PlayerGrowthRate1M,AvgFollowersGrowthRateDeviationRate1W,AvgFollowersGrowthRateDeviationRate2W,AvgFollowersGrowthRateDeviationRate1M,PositiveRateGrowthRate1W,PositiveRateGrowthRate2W,PositiveRateGrowthRate1M,DLC_since_last_discount,Sequel_since_last_discount
0,2023-06-29,10,Action,1,Valve,0,18.0,186.0,0.903226,14,...,-0.032626,-0.039192,-0.292056,-0.277988,-0.215706,5.3e-05,1.5e-05,2.3e-05,0,0
1,2023-09-12,10,Action,1,Valve,0,18.0,186.0,0.903226,7,...,0.031983,0.006604,-0.434575,-0.417947,-0.407693,4e-06,3.1e-05,-2.6e-05,0,0
2,2023-11-21,10,Action,1,Valve,0,18.0,186.0,0.903226,7,...,-0.030716,0.027451,0.482403,0.097192,0.050472,-0.000146,-0.000165,-0.000152,0,0
3,2023-12-21,10,Action,1,Valve,0,19.0,186.0,0.897849,14,...,-0.032172,-0.014952,-0.092429,-0.040663,0.235283,1.6e-05,1.4e-05,-1.1e-05,0,0
4,2024-03-14,10,Action,1,Valve,0,18.0,186.0,0.903226,7,...,0.045146,0.009495,0.107584,0.03312,-0.043617,2e-06,-2.2e-05,-5e-06,0,0
5,2024-06-27,10,Action,1,Valve,0,18.0,186.0,0.903226,14,...,-0.06644,0.023478,0.042397,-0.219353,-0.239947,-7e-06,1.2e-05,-5.9e-05,0,0
6,2024-11-27,10,Action,1,Valve,0,37.0,186.0,0.801075,7,...,0.029186,0.00638,0.287945,0.280904,0.13606,-1.8e-05,-1e-06,-2.7e-05,0,0
7,2024-12-19,10,Action,1,Valve,0,37.0,186.0,0.801075,14,...,0.038898,0.067424,0.034381,0.045228,0.375174,-2.7e-05,-1.9e-05,4.2e-05,0,0
8,2025-03-13,10,Action,1,Valve,0,37.0,186.0,0.801075,7,...,-0.083917,-0.083422,0.021288,0.070347,0.076495,1.1e-05,1.8e-05,-9e-06,0,0
9,2025-06-26,10,Action,1,Valve,0,37.0,186.0,0.801075,14,...,-0.099633,-0.050041,-0.103695,-0.161308,-0.125898,-6e-06,-1.6e-05,-7.6e-05,0,0


### Summary Stats

In [25]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
GameID,409.0,497223.716381,297538.620152,10.0,244850.0,477160.0,644930.0,1145360.0
MultiPlayer,409.0,0.503667,0.500599,0.0,0.0,1.0,1.0,1.0
ConstantDiscount,409.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Final price,409.0,220.880196,177.979242,0.0,119.0,160.0,278.0,959.0
OriginPrice,409.0,501.114914,326.580574,186.0,318.0,398.0,488.0,1599.0
DiscountRate,409.0,0.572639,0.193898,0.034301,0.5,0.600457,0.701005,1.0
DiscountDuration,409.0,11.222494,5.299545,1.0,7.0,11.0,14.0,32.0
DiscountFreq3M,409.0,2.701711,0.898813,1.0,2.0,3.0,3.0,6.0
Age,409.0,7.025569,3.988664,2.50137,4.723288,6.027397,7.961644,24.66575
AccumulatedPositiveRate,409.0,0.917436,0.065307,0.738856,0.885042,0.946994,0.967883,0.9926429


In [26]:
df.columns

Index(['Date', 'GameID', 'Type', 'MultiPlayer', 'Publisher',
       'ConstantDiscount', 'Final price', 'OriginPrice', 'DiscountRate',
       'DiscountDuration', 'DiscountFreq3M', 'Age', 'AccumulatedPositiveRate',
       'SalePeriod', 'PlayerGrowthRate1W', 'PlayerGrowthRate2W',
       'PlayerGrowthRate1M', 'AvgFollowersGrowthRateDeviationRate1W',
       'AvgFollowersGrowthRateDeviationRate2W',
       'AvgFollowersGrowthRateDeviationRate1M', 'PositiveRateGrowthRate1W',
       'PositiveRateGrowthRate2W', 'PositiveRateGrowthRate1M',
       'DLC_since_last_discount', 'Sequel_since_last_discount'],
      dtype='object')

### Assertions

In [27]:
# Write assertions here—conditions that should always hold in the data.
# Use them to demonstrate that you have carefully checked the data’s correctness.

#### 日期範圍檢查

In [28]:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

start_date = pd.to_datetime('2023-05-01')
end_date   = pd.to_datetime('2025-08-31')

assert df['Date'].notna().all(), "Date 欄有缺失值！"
assert ((df['Date'] >= start_date) & (df['Date'] <= end_date)).all(), "Date 不在指定範圍！"

#### 缺失值檢查

In [29]:
df.isna().sum()

Date                                     0
GameID                                   0
Type                                     0
MultiPlayer                              0
Publisher                                0
ConstantDiscount                         0
Final price                              0
OriginPrice                              0
DiscountRate                             0
DiscountDuration                         0
DiscountFreq3M                           0
Age                                      0
AccumulatedPositiveRate                  0
SalePeriod                               0
PlayerGrowthRate1W                       0
PlayerGrowthRate2W                       0
PlayerGrowthRate1M                       0
AvgFollowersGrowthRateDeviationRate1W    0
AvgFollowersGrowthRateDeviationRate2W    0
AvgFollowersGrowthRateDeviationRate1M    0
PositiveRateGrowthRate1W                 0
PositiveRateGrowthRate2W                 0
PositiveRateGrowthRate1M                 0
DLC_since_l

In [30]:
# 數值範圍檢查請見 summary stats