### Project Report
# Predicting Usage of a Video Game Research Server

By Victoria Zhou

## Introduction

Video games are a major part of modern entertainment and social interaction. As games grow more complex, understanding player behaviour helps improve design and infrastructure. Using data collected from the UBC Computer Science research group, I analyzed whether certain player traits can predict newsletter subscription.

My guiding question is: Can experience level, gender, age, and total playtime predict whether a player subscribes to the game-related newsletter?

I used two datasets: one with individual player information and another with records of their play sessions.

### Players Dataset Summary

| Variable Name | Variable Type | Description |
| --- | ----------- | -----------|
| experience | factor | Level of gameplay experience (Beginner, Amateur, Regular, Pro, Veteran).|
| subscribe | logical | Whether the player subscribed to the newsletter (TRUE or FALSE).|
| hashedEmail | character | Hashed email identifier (used to anonymize individual players).|
| played_hours | numeric | Total number of hours the player has spent on the server.|
| name | character | Player’s in-game display name (not used in analysis).|
| gender | factor | Player’s self-reported gender (Male, Female, Non-binary, Two-Spirited, Agender, Prefer not to say, Other).|
| Age | integer | Player’s age in years.|

Summary Statistics & Key Insights
- Number of observations: 196 players
- Number of variables: 7
- Source: Minecraft research server
- Collection Method: Player info collected at account registration, and gameplay time was recorded during server use.
- Subscription: 144 players subscribed (73%)
- Age: Mean = 20.5, Median = 19, Range = 8–50 
- Playtime: Mean = 5.85 hours
- Most common experience level: Amateur (63 players)
- Most common gender: Male (124 players)

Observations & Issues
- Missing values: Only in Age (2 cases)
- Limited behaviour tracking: No session or in-game activity metrics
- Sampling bias: Likely overrepresents younger users (median age = 19)

### Sessions Dataset Summary

| Variable Name | Variable Type | Description |
| --- | ----------- | -----------|
| hashedEmail | character | Player identifier to link with user-level data.|
| start_time | Date | Date the session started (DD/MM/YYYY).|
| end_time | Date | Date the session ended (DD/MM/YYYY).|
| original_start_time | numeric | Unix timestamp of start time (milliseconds).|
| original_end_time | numeric | Unix timestamp of end time (milliseconds).|

Summary Statistics & Key Insights
- Number of observations: 1,535 session records
- Number of variables: 5
- Source: Minecraft research server 
- Collection Method: Collected automatically by the server each time a player logged in and out
- Most active player: Logged 310 sessions
- Session date range: From approximately March 2024 to August 2024

Observations & Issues
- Missing data: Two sessions are missing end_time.
- Session duration: Not provided directly, must be calculated from start and end timestamps.
- Time of day: Available in string form but not separated — must be parsed to analyze peak usage hours.

## Methods & Results

### Preliminary Exploratory Data Analysis

In [1]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
source("cleanup.R")

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.6     [32m✔[39m [34mrsample     [39

ERROR: Error in file(filename, "r", encoding = encoding): cannot open the connection
