# Predicting Newsletter Subscription Based on Player Behaviour
## Introduction
**Background**: UBC's Pacific Laboratory for Artificial Intelligence (PLAI) research group runs a Minecraft server called PLAICraft to study player behaviour. They want to know what player traits and behaviours are linked to subscribing to a newsletter.

**Research Question**: Can we predict whether a player will subscribe to a newsletter based on their demographics and gameplay behaviour?

In [4]:
# Load libraries
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


## Data Loading

In [46]:
# Load data
players <- read_csv("data/players.csv")
sessions <- read_csv("data/sessions.csv")

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m1535[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): hashedEmail, start_time, end_time
[32mdbl[39m (2): original_start_time, original_end_time

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


## Data Wrangling

In [50]:
# Compute session duration
sessions <- sessions |>
    mutate(duration_minutes = as.numeric(as_datetime(original_end_time / 1000) - as_datetime(original_start_time / 1000)) / 60)

# Summary table for player data
session_summary <- sessions |>
    group_by(hashedEmail) |>
    summarize(total_sessions = n(),
             total_minutes_played = sum(duration_minutes, na.rm = TRUE),
             avg_session_duration = mean(duration_minutes, na.rm = TRUE))

# Merge with player data
players_merge <- players |>
    left_join(session_summary, by = "hashedEmail") |>
    mutate(
        total_sessions = replace_na(total_sessions, 0),
        total_minutes_played = replace_na(total_minutes_played, 0),
        avg_session_duration = replace_na(avg_session_duration, 0),
        total_hours_played = total_minutes_played / 60
    )

experience,subscribe,hashedEmail,played_hours,name,gender,Age,total_sessions,total_minutes_played,avg_session_duration,total_hours_played
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<int>,<dbl>,<dbl>,<dbl>
Pro,TRUE,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9,27,2000.0000,74.07407,33.333333
Veteran,TRUE,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17,3,166.6667,55.55556,2.777778
Veteran,FALSE,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17,1,0.0000,0.00000,0.000000
Amateur,TRUE,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21,1,0.0000,0.00000,0.000000
Regular,TRUE,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21,1,0.0000,0.00000,0.000000
Amateur,TRUE,f58aad5996a435f16b0284a3b267f973f9af99e7a89bee0430055a44fa92f977,0.0,Adrian,Female,17,0,0.0000,0.00000,0.000000
Regular,TRUE,8e594b8953193b26f498db95a508b03c6fe1c24bb5251d392c18a0da9a722807,0.0,Luna,Female,19,0,0.0000,0.00000,0.000000
Amateur,FALSE,1d2371d8a35c8831034b25bda8764539ab7db0f63938696917c447128a2540dd,0.0,Emerson,Male,21,1,0.0000,0.00000,0.000000
Amateur,TRUE,8b71f4d66a38389b7528bb38ba6eb71157733df7d1740371852a797ae97d82d1,0.1,Natalie,Male,17,1,0.0000,0.00000,0.000000
Veteran,TRUE,bbe2d83de678f519c4b3daa7265e683b4fe2d814077f9094afd11d8f217039ec,0.0,Nyla,Female,22,0,0.0000,0.00000,0.000000


In [None]:
#Join with Player Data
player_join <- 