# Data Description
- Author: 方珮潔、黃書佑
- Date first created: 20251026
- Background: 此資料表來自 `extent-of-discount-rate-processed-code.ipynb` 之結果
- Unit of observation: 473
- Sample period: 20230501 ~ 20250831
- Known issues: 
    - 僅考慮有一定規模的遊戲。
    - 無法捕捉非季節性促銷。
- Definition for each variable:
    - `Date`: datetime，當次折扣的日期。
    - `GameID`: object，該遊戲 unique 的 ID。
    - `Type`: object，該遊戲的類型。
    - `MultiPlayer`: int，該遊戲是否屬多人遊戲。
    - `Publisher`: object，該遊戲的出版商名稱。
    - `ConstantDiscount`: int，該遊戲的折扣幅度是否固定。（應為 0）
    - `DiscountRate`: float，當次折扣的幅度，為正表示降價。（應大於 0，表示降價）
    - `Age`: float，折扣當下距該遊戲發布多少年。
    - `AccumulatedPositiveRate`: float，該遊戲累計至折扣當下的總評價正面率。
    - `SalePeriod`: int，當次折扣是否位於季節性促銷期間。
    - `PlayerGrowthRate1W`: float，玩家數一週的移動平均增長率。
    - `PlayerGrowthRate2W`:float，玩家數兩週的移動平均增長率。
    - `PlayerGrowthRate1M`: float，玩家數一個月的移動平均增長率。
    - `FollowersGrowthRate1W`:float，追蹤人數一週的增長率。
    - `FollowersGrowthRate2W`:float，追蹤人數兩週的增長率。
    - `FollowersGrowthRate1M`: float，追蹤人數一個月的增長率。
    - `PositiveRateGrowthRate1W`:float，評價正面率一週的增長率。
    - `PositiveRateGrowthRate2W`:float，評價正面率兩週的增長率。
    - `PositiveRateGrowthRate1M`:float，評價正面率一個月的增長率。
    - `DLC_since_last_discount`:int，截至上一次折扣共推出多少新的 DLC。
    - `Sequel_since_last_discount`:int，截至上一次折扣共堆出多少新的續作。

In [9]:
import pandas as pd

## Change for Your Own Data

In [10]:
input_data_file = "/Users/peijiefang/Downloads/data analysis project/extent-of-discount-rate-DE.csv"

In [11]:
df = pd.read_csv(input_data_file)

### Sample

In [12]:
df.head(10)

Unnamed: 0,Date,GameID,Type,MultiPlayer,Publisher,ConstantDiscount,DiscountRate,Age,AccumulatedPositiveRate,SalePeriod,...,PlayerGrowthRate2W,PlayerGrowthRate1M,FollowersGrowthRate1W,FollowersGrowthRate2W,FollowersGrowthRate1M,PositiveRateGrowthRate1W,PositiveRateGrowthRate2W,PositiveRateGrowthRate1M,DLC_since_last_discount,Sequel_since_last_discount
0,2023-06-29,10,Action,1,Valve,0,0.903226,22.671233,0.975168,1,...,-0.033624,-0.037584,0.000673,0.001362,0.003116,5.3e-05,1.5e-05,2.3e-05,0,0
1,2023-09-12,10,Action,1,Valve,0,0.903226,22.876712,0.97499,0,...,0.022963,0.003837,0.000553,0.001087,0.002301,4e-06,3.1e-05,-2.6e-05,0,0
2,2023-09-15,10,Action,1,Valve,0,0.897849,22.884932,0.974926,0,...,0.021006,0.013577,0.001878,0.002493,0.003623,-6.8e-05,-3.5e-05,-7.2e-05,0,0
3,2023-11-21,10,Action,1,Valve,0,0.903226,23.068493,0.974686,1,...,-0.039165,0.015256,0.001381,0.002082,0.004059,-0.000146,-0.000165,-0.000152,0,0
4,2023-12-21,10,Action,1,Valve,0,0.897849,23.150685,0.974675,1,...,-0.033895,-0.006214,0.00079,0.001743,0.004781,1.6e-05,1.4e-05,-1.1e-05,0,0
5,2024-03-14,10,Action,1,Valve,0,0.903226,23.380822,0.974604,1,...,0.044649,0.012697,0.000952,0.001814,0.003629,2e-06,-2.2e-05,-5e-06,0,0
6,2024-06-27,10,Action,1,Valve,0,0.903226,23.668493,0.974511,1,...,-0.063312,0.025032,0.000744,0.001476,0.002994,-7e-06,1.2e-05,-5.9e-05,0,0
7,2024-11-27,10,Action,1,Valve,0,0.801075,24.087671,0.974273,1,...,0.028233,0.007734,0.001202,0.002417,0.004475,-1.8e-05,-1e-06,-2.7e-05,0,0
8,2024-12-19,10,Action,1,Valve,0,0.801075,24.147945,0.974334,1,...,0.044363,0.071397,0.00092,0.001795,0.005306,-2.7e-05,-1.9e-05,4.2e-05,0,0
9,2025-03-13,10,Action,1,Valve,0,0.801075,24.378082,0.97437,1,...,-0.092684,-0.085772,0.000899,0.001986,0.004205,1.1e-05,1.8e-05,-9e-06,0,0


### Summary Stats

In [13]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
GameID,473.0,487668.033827,288815.480496,10.0,244850.0,431730.0,644930.0,1145360.0
MultiPlayer,473.0,0.46723,0.499453,0.0,0.0,0.0,1.0,1.0
ConstantDiscount,473.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
DiscountRate,473.0,0.568827,0.185389,0.034301,0.5,0.600287,0.700513,1.0
Age,473.0,7.204906,3.884397,2.50137,4.835616,6.315068,8.419178,24.66575
AccumulatedPositiveRate,473.0,0.886172,0.125249,0.428571,0.851377,0.938443,0.967536,0.9926429
SalePeriod,473.0,0.621564,0.48551,0.0,0.0,1.0,1.0,1.0
PlayerGrowthRate1W,473.0,22.944477,314.977485,-0.998183,-0.081144,-0.028214,0.017345,5921.714
PlayerGrowthRate2W,473.0,5.503044,80.009064,-0.999841,-0.138087,-0.051697,0.028233,1566.812
PlayerGrowthRate1M,473.0,13.058917,234.840813,-0.999799,-0.148136,-0.029892,0.075873,5055.833


In [14]:
df.columns

Index(['Date', 'GameID', 'Type', 'MultiPlayer', 'Publisher',
       'ConstantDiscount', 'DiscountRate', 'Age', 'AccumulatedPositiveRate',
       'SalePeriod', 'PlayerGrowthRate1W', 'PlayerGrowthRate2W',
       'PlayerGrowthRate1M', 'FollowersGrowthRate1W', 'FollowersGrowthRate2W',
       'FollowersGrowthRate1M', 'PositiveRateGrowthRate1W',
       'PositiveRateGrowthRate2W', 'PositiveRateGrowthRate1M',
       'DLC_since_last_discount', 'Sequel_since_last_discount'],
      dtype='object')

### Assertions

In [15]:
# Write assertions here—conditions that should always hold in the data.
# Use them to demonstrate that you have carefully checked the data’s correctness.

#### 日期範圍檢查

In [16]:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

start_date = pd.to_datetime('2023-05-01')
end_date   = pd.to_datetime('2025-08-31')

assert df['Date'].notna().all(), "Date 欄有缺失值！"
assert ((df['Date'] >= start_date) & (df['Date'] <= end_date)).all(), "Date 不在指定範圍！"

#### 缺失值檢查

In [17]:
df.isna().sum()

Date                          0
GameID                        0
Type                          0
MultiPlayer                   0
Publisher                     0
ConstantDiscount              0
DiscountRate                  0
Age                           0
AccumulatedPositiveRate       0
SalePeriod                    0
PlayerGrowthRate1W            0
PlayerGrowthRate2W            0
PlayerGrowthRate1M            0
FollowersGrowthRate1W         0
FollowersGrowthRate2W         0
FollowersGrowthRate1M         0
PositiveRateGrowthRate1W      0
PositiveRateGrowthRate2W      0
PositiveRateGrowthRate1M      0
DLC_since_last_discount       0
Sequel_since_last_discount    0
dtype: int64

In [18]:
# 數值範圍檢查請見 summary stats