# Predicting Usage of a Video Game Research Server
Student Name: Vitor Han 

Student Number: 53497632

## Introduction

In collaboration with a UBC Computer Science research group, this project explores behavioral patterns of players on a Minecraft research server. The server records each player’s activity to understand player engagement, data contribution, and predict demand. Accurately predicting these factors helps the research team allocate server resources, target recruitment strategies, and improve player experience.

I selected Question 1 of the broad questions and use it to formulate a specific question using some of the variables in the dataset.
**Question 1:**  Can a player's gender predict their likelihood of subscribing to a game-related newsletter, and does this relationship differ between novice and experienced ("Pro" ) players?


**Link to github repository:** https://github.com/vitxrlee/dsci-100-2025SS1-project?tab=readme-ov-file



## Data Description 

The dataset consists of two files:

- players.csv: Contains demographic and gameplay-related features for each player (e.g., player ID, total play time, play frequency, and whether they subscribed to the newsletter).

- sessions.csv: Includes logs of each play session per player (e.g., session start/end, actions performed, duration).

### Summary of dataset:

**players.csv (each row in this dataset indicates an individual player):**

- experience: Self-reported gaming experience, categorized as Beginner, Amateur, Regular, Veteran, or Pro.

- subscribe: Indicating whether the player subscribed to the server’s content or notifications.

- hashedEmail: A pseudonymized identifier for each player.

- played_hours: Total number of hours the player has played on the server.

- name: The first name of the player.

- gender: Gender identity (Male, Female, Non-binary).

- age: The player’s self-reported age (integer).

**sessions.csv:**

- hashedEmail: useless in our project

- start_time: The human-readable start time of the session.

- end_time: The human-readable end time of the session.

- original_start_time: Start time in Unix timestamp format.

- original_end_time: End time in Unix timestamp format.


## Methods & Results



In [None]:
library(tidyverse)
library(rsample)
library(tidymodels)

# Download data
player_url <- "https://raw.githubusercontent.com/vitxrlee/dsci-100-2025SS1-project/refs/heads/main/players.csv"  
session_url <- "https://raw.githubusercontent.com/vitxrlee/dsci-100-2025SS1-project/refs/heads/main/sessions.csv"

# Saving locally
download.file(player_url, destfile = "players.csv")
download.file(session_url, destfile = "sessions.csv")

#read data
players <- read_csv("players.csv") 
sessions <- read_csv("sessions.csv") 

In [None]:


players <- mutate(players, gender_simple = ifelse(
    gender == "Male", "Male",
    ifelse(gender == "Female", "Female", "Other")
  )
)
gender_subscribe <- players |>
  group_by(gender_simple, subscribe) |>
  summarize(count = n())

gender_subscribe
     

In [None]:
gender_subscribe_female_bar <-  gender_subscribe |> 
     filter(gender_simple == "Female") |>
     ggplot(aes(x=subscribe,y=count)) +
     geom_bar(stat="identity")+
     labs(x="Subscription Status",y="Number Of Female Players",title="Female User Subscription Overview") 
gender_subscribe_female_bar

In [None]:

gender_subscribe_male_bar <-  gender_subscribe |> 
     filter(gender_simple == "Male") |>
     ggplot(aes(x=subscribe,y=count)) +
     geom_bar(stat="identity")+
     labs(x="Subscription Status",y="Number Of Male Players",title="Male User Subscription Overview") 
gender_subscribe_male_bar

In [None]:
gender_subscribe_gender_minorities_bar <-  gender_subscribe |> 
     filter(gender_simple == "Other") |>
     ggplot(aes(x=subscribe,y=count)) +
     geom_bar(stat="identity")+
     labs(x="Subscription Status",y="Number Of Minorities Players",title="Minorities User Subscription Overview") 
gender_subscribe_gender_minorities_bar
     

## Discussion

The KNN model demonstrated that behavioral metrics such as played hours, session frequency, and experience level are useful predictors of whether a player subscribes to the newsletter. Players with longer play times and more frequent sessions showed a higher likelihood of subscribing.

This aligns with expectations: more engaged users are more likely to seek updates and involvement with the game community. This analysis can guide the research team in targeting engaged users for promotional content and community building.

Some limitations include:

No session-level interaction or action data

Assumes static behavior over time

Uses basic features without engineered game event categories

Future work should explore logistic regression and decision trees to assess interpretability and feature impact. Social activity (e.g., simultaneous play) could also offer predictive power.