# Background

- **Author**: `<黃書佑>`
- **Created At**: `<2025-11-29>`
- **Research Motivation and Context (why are we interested in the findings?)：**
透過logistic regression發現，有三個遊戲存在個體差異，分別是
    - GameID_548430喜歡在非季節性期間打折  
    - GameID_367520喜歡在季節期間打折  
    - GameID_431960不喜歡在季節期間打折
因此我們好奇該現象的原因。
- **Main Findings and Takeaways：**
    1. GameID_548430:
        - 數據特徵：
            - 類型：多人合作射擊 (MultiPlayer = 1, Tag: Dwarf/Co-op)。
            - 價格：高定價 (原價 438)，但價格波動大 (Std Dev ~143)，最低下殺至 131。
            - 玩家數：平均約 1.7 萬人，依賴在線活躍度。
        - 推測原因：
            - 維持多人連線熱度：這是一款高度依賴「排隊」與「組隊」的多人遊戲。如果玩家基數下降，遊戲體驗會變差。
            - 避開競爭紅海：在大型季節性特賣（如夏特、冬特）期間，所有 3A 大作都在打折，多人遊戲的流量會被分散。選擇在非季節性期間打折，可以獨佔Steam 首頁版面，有效導入新血，維持伺服器活絡。
    2. GameID_367520:
        - 數據特徵：
            - 類型：單人 Metroidvania (MultiPlayer = 0, Tag: Metroidvania)。
            - 價格：中等定價 (原價 268)，價格波動中等，最低至 134 (約 -50%)。
            - 玩家數：平均約 8500 人，屬於長銷型單機遊戲。
        - 推測原因：
            - 囤貨心理：作為一款口碑極佳的單機遊戲，玩家通常是「先買來放著之後玩」。這類消費行為最常發生在季節性大促（大家習慣一次清空願望清單時）。
        - 無需維持在線人數：單機遊戲不需要大量同時在線玩家也能遊玩，因此不需要像多人遊戲那樣刻意避開大促來搶人頭，反而順應大流在流量最高的時候促銷效果最好。
    3. GameID_431960:
        - 數據特徵：
            - 類型：實用軟體/工具 (Type: Mature/Utility)。
            - 價格：低定價 (原價 120)，且價格極度穩定 (Std Dev 僅 3.5，絕大多數時間維持原價，最低僅 96)。
            - 玩家數：平均高達 11.7 萬人，且極度穩定。
        - 推測原因：
            - 非遊戲屬性：這是一個「工具」而非傳統遊戲。用戶只有在「需要換桌布」或「組裝新電腦」時才會購買，這種剛性需求不受季節性促銷影響。
            - 低價策略：原價 120 已經屬於「衝動消費」的門檻內，打折對銷量的邊際效益很低。既然不打折也能穩定賣，且擁有龐大的穩定用戶群，開發者就沒有動力參與折扣戰。
- **Future Direciton：**
    擴大樣本進行集群分析驗證推論普適性： 目前的研究僅針對三個特異個體。未來應將此發現推廣至整個數據集，檢驗是否所有「高價多人合作遊戲」都傾向於非季節性打折，以及是否所有「低價工具軟體」都具備價格剛性。

In [13]:
# Load packages here
import pandas as pd


In [14]:
# Load the files
df_extent = pd.read_csv('/Users/sofia/Downloads/extent-of-discount-rate-DE.csv')
df_all = pd.read_csv('/Users/sofia/Downloads/all_games_merged.csv')

In [15]:
# Target Game IDs
target_ids = [548430, 367520, 431960]

# Filter data
df_extent_filtered = df_extent[df_extent['GameID'].isin(target_ids)]
df_all_filtered = df_all[df_all['GameID'].isin(target_ids)]

# Get unique static attributes from extent file
static_cols = ['GameID', 'Type', 'MultiPlayer', 'Publisher', 'OriginPrice', 'ConstantDiscount']
# We take the first occurrence since these shouldn't change much, or we can check for uniqueness
game_info = df_extent_filtered[static_cols].drop_duplicates()

# Calculate some aggregate stats from extent file (e.g. average discount rate, frequency)
agg_stats = df_extent_filtered.groupby('GameID').agg({
    'DiscountRate': 'mean',
    'DiscountDuration': 'mean',
    'Age': 'max', # Max age to see how old it is
    'AccumulatedPositiveRate': 'mean',
    'SalePeriod': 'mean', # Proportion of time in sale period (might not be useful if we just want to know strategy)
    'DLC_since_last_discount': 'max',
    'Sequel_since_last_discount': 'max'
}).reset_index()

# Merge
comparison = pd.merge(game_info, agg_stats, on='GameID')

# Check player stability from all_games file (std dev of players?)
player_stats = df_all_filtered.groupby('GameID')['Players'].agg(['mean', 'std']).reset_index()
player_stats['CV'] = player_stats['std'] / player_stats['mean'] # Coefficient of Variation

comparison = pd.merge(comparison, player_stats, on='GameID')

print(comparison)

# Also check specific "Type" values in all_games_merged if they differ
print("\nTypes in all_games_merged:")
print(df_all_filtered[['GameID', 'Type']].drop_duplicates())

   GameID   Type  MultiPlayer                Publisher  OriginPrice  \
0  548430  Dwarf            1  Coffee Stain Publishing        438.0   

   ConstantDiscount  DiscountRate  DiscountDuration       Age  \
0                 0      0.672646         15.571429  5.205479   

   AccumulatedPositiveRate  SalePeriod  DLC_since_last_discount  \
0                 0.971381    0.285714                        0   

   Sequel_since_last_discount      mean          std        CV  
0                           0  17528.96  6080.455871  0.346881  

Types in all_games_merged:
       GameID          Type
0      431960        Mature
3901   367520  Metroidvania
16570  548430         Dwarf


In [17]:
# Extract features from all_games_merged.csv for the 3 games
df_targets = df_all[df_all['GameID'].isin(target_ids)]

# Static features (take the most common or first value)
game_profiles = df_targets.groupby('GameID').agg({
    'Type': lambda x: x.mode()[0] if not x.mode().empty else x.iloc[0],
    'MultiPlayer': 'max',
    'Publisher': lambda x: x.mode()[0] if not x.mode().empty else x.iloc[0],
    'OriginPrice': 'max',
    'ConstantDiscount': 'max',
    'ReleaseDate': 'max',
    'Players': ['mean', 'std'],
    'Followers': 'max',
    'Positive reviews': 'max'
}).reset_index()

# Flatten columns
game_profiles.columns = ['GameID', 'Type', 'MultiPlayer', 'Publisher', 'OriginPrice', 'ConstantDiscount', 'ReleaseDate', 'AvgPlayers', 'StdPlayers', 'MaxFollowers', 'MaxPosReviews']

# Calculate Age (approximate from Release Date to now)
game_profiles['ReleaseDate'] = pd.to_datetime(game_profiles['ReleaseDate'])
game_profiles['Age_Years'] = (pd.to_datetime('2025-10-01') - game_profiles['ReleaseDate']).dt.days / 365

print(game_profiles)

   GameID          Type  MultiPlayer                Publisher  OriginPrice  \
0  367520  Metroidvania            0              Team Cherry          268   
1  431960        Mature            0    Wallpaper Engine Team          120   
2  548430         Dwarf            1  Coffee Stain Publishing          438   

   ConstantDiscount ReleaseDate     AvgPlayers    StdPlayers  MaxFollowers  \
0                 1  2017-02-24    8501.313846   4673.344829        407806   
1                 1  2018-11-16  117471.807179  10256.429849       1373990   
2                 0  2020-05-13   17528.960000   6080.455871       2744924   

   MaxPosReviews  Age_Years  
0         425640   8.605479  
1         900581   6.879452  
2         341186   5.389041  


In [18]:
# Check price changes for the 3 games
price_stats = df_targets.groupby('GameID')['Final price'].agg(['min', 'max', 'mean', 'std']).reset_index()
print(price_stats)

# Check unique values of ConstantDiscount for each game
constant_discount_counts = df_targets.groupby('GameID')['ConstantDiscount'].value_counts()
print("\nConstantDiscount values:")
print(constant_discount_counts)

   GameID    min    max        mean         std
0  367520  134.0  398.0  253.050256   51.939751
1  431960   96.0  120.0  119.458462    3.566049
2  548430  131.0  438.0  325.340513  143.205338

ConstantDiscount values:
GameID  ConstantDiscount
367520  1                   975
431960  1                   975
548430  0                   975
Name: count, dtype: int64
