# Rough Work for Individual Submission (all relevant things copied to the main document)



In [None]:
#attach packages to work with data
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
#source("cleanup.R")



## opening files, making them tidy


In [None]:
#open files for descriptive summaries
players_data = read_csv("players.csv")
sessions_data = read_csv("sessions.csv")

players_data
sessions_data



In [None]:
sessions_wrangled = sessions_data |>
    separate(start_time, into = c("start_date", "start_time"), sep = " ") |>
    mutate(start_date = as.Date(start_date)) |>
    separate(end_time, into = c("end_date", "end_time"), sep = " ") |>
    mutate(end_date = as.Date(end_date))

sessions_wrangled


## Getting descriptive summaries of datasets


In [None]:
#players_data
players_summary = players_data |>
    summary()
players_summary

experience_types = unique(players_data$experience) #$ access unique values from experience in players_data
experience_types

gender_types = unique(players_data$gender)
gender_types


### Dataset Summary: players_data

The `players_data` dataset contains 196 observations (i.e., 196 players). There are seven different variables. Below are the variables' names, types, and meanings.

- `experience`: character variable, 5 categories ("Pro", "Veteran", "Amateur", "Regular", "Beginner") based on how much experience the player has with Minecraft.
- `subscribe`: logical variable, returns "TRUE" or "FALSE" depending if the player is currently subscribed to a game-related newsletter
- `hashedEmail`: character variable returning the player's email, irreversibly transformed to a unique string (facilitating individualization while retaining privacy).
- `played_hours`: double-class variable corresponding to how many hours the player has spent playing the game.
- `name`: character variable containing the player's name (likely first name).
- `gender`: character variable, 7 categories ("Male", "Female", "Non-binary", "Prefer not to say", "Agender", "Two-Spirited", "Other"), self-reported by the player.
- `Age`: double-class variable returning the player's age in years (self-reported).

While summary statistics can be individually computed with commands like `mean()`, `min()`, or `max()`, in this case it is efficient to collect them all at once with `summary()`. There are three different variables in `players_data` for which summary statistics can be calculated (barring general statistics like `length`, `class`, and `mode`). Summary statistics have been rounded to two decimal places.

| Variable | min | 1st quartile | median | mean | 3rd quartile | max | NA's |
| -------- | --- | ------------ | ------ | ---- | ------------ | --- | ---- |
| `played_hours` (hours) | 0.00 | 0.00 | 0.10 | 5.85 | 0.60 | 223.10 | -- |
| `age` (years) | 9.00 | 17.00 | 19.00 | 21.14 | 22.75 | 58.00 | 2 |

For `age`, NA's means that two players opted not to share their ages.
The third variable in `players_data` with summary statistics is `subscribe`, as such:

| Variable | TRUE | FALSE |
| -------- | ---- | ----- |
| `subscribe` | 144 | 52 |

This just means that 144 players in the dataset are subscribed to a game-related newsletter.

There are no apparent issues in the data. There could be issues that I cannot see, for example in the `hashedEmail` variable. Values within variables (perhaps `name` and `hashedEmail`) could be mismatched, and I cannot see the "real" email name to infer if the values are assigned properly.

The `players_data` data was collected by the Pacific Laboratory for Artificial Intelligence (PLAI) research group in the Department of Computer Science at UBC. PLAI has created PLAICraft (a vanilla survival Minecraft server) to record participant players' gameplay, speech, and key presses to advance AI, hopefully to assist NPC development in "normal" Minecraft.



In [None]:
sessions_data

sessions_summary = sessions_data |>
    summary()
sessions_summary


### Dataset Summary: sessions_data

The `sessions_data` dataset contains 1535 observations. There are five different variables. Below are the variables' names, types, and meanings.

- `hashedEmail`: character variable returning the player's email, irreversibly transformed into a unique string (facilitating individualization while retaining privacy).
- `start_time`: character variable detailing the date and time a player begins a session on PLAICraft.
- `end_time`: character variable detailing the date and time a player ends a session on PLAICraft.
- `original_start_time`: double-class variable, the same as `start_time` but recorded in 