# Data Description
- Author: 方珮潔、黃書佑
- Date first created: 20251026
- Background: 此資料表來自 `discount-timing-processed-code.ipynb` 之結果
- Unit of observation: 21,938
- Sample period: 20230501 ~ 20250831
- Known issues: 
    - 僅考慮有一定規模的遊戲。
    - 無法捕捉非季節性促銷。
- Definition for each variable:
    - `Date`: datetime，當次折扣的日期。
    - `GameID`: object，該遊戲 unique 的 ID。
    - `Type`: object，該遊戲的類型。
    - `MultiPlayer`: int，該遊戲是否屬多人遊戲。
    - `Publisher`: object，該遊戲的出版商名稱。
    - `ConstantDiscount`: int，該遊戲的折扣幅度是否固定。
    - `DiscountOrNot`: int，當天是否有折扣。
    - `Age`: float，折扣當下距該遊戲發布多少年。
    - `AccumulatedPositiveRate`: float，該遊戲累計至折扣當下的總評價正面率。
    - `SalePeriod`: int，當次折扣是否位於季節性促銷期間。
    - `PlayerGrowthRate1W`: float，玩家數一週的移動平均增長率。
    - `PlayerGrowthRate2W`:float，玩家數兩週的移動平均增長率。
    - `PlayerGrowthRate1M`: float，玩家數一個月的移動平均增長率。
    - `FollowersGrowthRate1W`:float，追蹤人數一週的增長率。
    - `FollowersGrowthRate2W`:float，追蹤人數兩週的增長率。
    - `FollowersGrowthRate1M`: float，追蹤人數一個月的增長率。
    - `PositiveRateGrowthRate1W`:float，評價正面率一週的增長率。
    - `PositiveRateGrowthRate2W`:float，評價正面率兩週的增長率。
    - `PositiveRateGrowthRate1M`:float，評價正面率一個月的增長率。
    - `DaysSinceLastDLC`:int，距離上一次 DLC 推出間隔幾天。（-1 表在這之前並未推出過任何 DLC）
    - `DaysSinceLastSequel`:int，距離上一次續作推出間隔幾天。（-1 表示在這之前並未推出過任何續作）

In [14]:
import pandas as pd

## Change for Your Own Data

In [15]:
input_data_file = "/Users/peijiefang/Downloads/data analysis project/discount-timing-DE.csv"

In [16]:
df = pd.read_csv(input_data_file)

### Sample

In [17]:
df.head(10)

Unnamed: 0,Date,GameID,Type,MultiPlayer,Publisher,ConstantDiscount,DiscountOrNot,Age,AccumulatedPositiveRate,SalePeriod,...,PlayerGrowthRate2W,PlayerGrowthRate1M,FollowersGrowthRate1W,FollowersGrowthRate2W,FollowersGrowthRate1M,PositiveRateGrowthRate1W,PositiveRateGrowthRate2W,PositiveRateGrowthRate1M,DaysSinceLastDLC,DaysSinceLastSequel
0,2023-05-01,10,Action,1,Valve,0,0,22.509589,0.975206,0,...,-0.055961,-0.077896,0.000788,0.00171,0.003889,1.2e-05,-1.4e-05,1.098178e-05,-1,-1
1,2023-05-02,10,Action,1,Valve,0,0,22.512329,0.975184,0,...,-0.052334,-0.085461,0.000783,0.00173,0.003913,-1e-05,-3.6e-05,-4.698912e-07,-1,-1
2,2023-05-03,10,Action,1,Valve,0,0,22.515068,0.975189,0,...,-0.052562,-0.087459,0.000814,0.001735,0.003979,-1.1e-05,-4.1e-05,-8.082766e-07,-1,-1
3,2023-05-04,10,Action,1,Valve,0,0,22.517808,0.975183,0,...,-0.053507,-0.090651,0.000829,0.001719,0.004101,-1.2e-05,-5e-05,-2.45082e-05,-1,-1
4,2023-05-05,10,Action,1,Valve,0,0,22.520548,0.97517,0,...,-0.04825,-0.091779,0.000884,0.001674,0.003912,-2.3e-05,-5.3e-05,-3.754777e-05,-1,-1
5,2023-05-06,10,Action,1,Valve,0,0,22.523288,0.97518,0,...,-0.039123,-0.093656,0.000874,0.001648,0.00378,-2e-05,-2.4e-05,-2.712693e-05,-1,-1
6,2023-05-07,10,Action,1,Valve,0,0,22.526027,0.97518,0,...,-0.034727,-0.100339,0.000838,0.001678,0.003789,-2.7e-05,-2.9e-05,-2.743809e-05,-1,-1
7,2023-05-08,10,Action,1,Valve,0,0,22.528767,0.975177,0,...,-0.032014,-0.104849,0.000843,0.001633,0.003773,-2.9e-05,-1.7e-05,-3.130728e-05,-1,-1
8,2023-05-09,10,Action,1,Valve,0,0,22.531507,0.975176,0,...,-0.032631,-0.108857,0.000863,0.001647,0.003788,-8e-06,-1.8e-05,-2.608948e-05,-1,-1
9,2023-05-10,10,Action,1,Valve,0,0,22.534247,0.975175,0,...,-0.033829,-0.108362,0.000808,0.001622,0.003722,-1.4e-05,-2.5e-05,-3.241372e-05,-1,-1


### Summary Stats

In [18]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
GameID,21938.0,455795.908925,289421.45365,10.0,244850.0,423710.0,588650.0,1145360.0
MultiPlayer,21938.0,0.433312,0.495544,0.0,0.0,0.0,1.0,1.0
ConstantDiscount,21938.0,0.200064,0.400057,0.0,0.0,0.0,0.0,1.0
DiscountOrNot,21938.0,0.020968,0.143281,0.0,0.0,0.0,0.0,1.0
Age,21938.0,7.531344,4.307168,2.389041,4.920548,6.323288,8.465753,24.50959
AccumulatedPositiveRate,21938.0,0.901283,0.119497,0.425325,0.884299,0.95284,0.971938,0.9929734
SalePeriod,21938.0,0.320221,0.466572,0.0,0.0,0.0,1.0,1.0
PlayerGrowthRate1W,21938.0,34.823854,592.169355,-0.999953,-0.069164,-0.014711,0.039745,24563.43
PlayerGrowthRate2W,21938.0,9.693256,216.482984,-0.999902,-0.095771,-0.015546,0.070855,13870.45
PlayerGrowthRate1M,21938.0,0.694885,14.708579,-0.999799,-0.117624,-0.009335,0.10166,902.6154


In [19]:
df.columns

Index(['Date', 'GameID', 'Type', 'MultiPlayer', 'Publisher',
       'ConstantDiscount', 'DiscountOrNot', 'Age', 'AccumulatedPositiveRate',
       'SalePeriod', 'PlayerGrowthRate1W', 'PlayerGrowthRate2W',
       'PlayerGrowthRate1M', 'FollowersGrowthRate1W', 'FollowersGrowthRate2W',
       'FollowersGrowthRate1M', 'PositiveRateGrowthRate1W',
       'PositiveRateGrowthRate2W', 'PositiveRateGrowthRate1M',
       'DaysSinceLastDLC', 'DaysSinceLastSequel'],
      dtype='object')

### Assertions

In [20]:
# Write assertions here—conditions that should always hold in the data.
# Use them to demonstrate that you have carefully checked the data’s correctness.

#### 日期範圍檢查

In [21]:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

start_date = pd.to_datetime('2023-05-01')
end_date   = pd.to_datetime('2025-04-30')

assert df['Date'].notna().all(), "Date 欄有缺失值！"
assert ((df['Date'] >= start_date) & (df['Date'] <= end_date)).all(), "Date 不在指定範圍！"

#### 缺失值檢查

In [22]:
df.isna().sum()

Date                        0
GameID                      0
Type                        0
MultiPlayer                 0
Publisher                   0
ConstantDiscount            0
DiscountOrNot               0
Age                         0
AccumulatedPositiveRate     0
SalePeriod                  0
PlayerGrowthRate1W          0
PlayerGrowthRate2W          0
PlayerGrowthRate1M          0
FollowersGrowthRate1W       0
FollowersGrowthRate2W       0
FollowersGrowthRate1M       0
PositiveRateGrowthRate1W    0
PositiveRateGrowthRate2W    0
PositiveRateGrowthRate1M    0
DaysSinceLastDLC            0
DaysSinceLastSequel         0
dtype: int64

In [23]:
# 數值範圍檢查請見 summary stats