## Minecraft Player Data Analysis Project

![Minecraft](https://upload.wikimedia.org/wikipedia/en/b/b6/Minecraft_2024_cover_art.png)

## Introduction
#### Background Information
Aresearch group in Computer Science at UBC, led by Frank Wood, is collecting data about how people play video games. They record players' actions as they play through a MineCraft serverLinks to an external site.

#### Question 2
We would like to know which "kinds" of players are most likely to contribute a large amount of data so that we can target those players in our recruiting efforts.



## Players Dataset Overview:
This dataset observes Minecraft user's gaming experience and contains information like player status, subscription, playtime, and demographics. 
* Number of observations (rows): 196
* Number of variables (columns): 7


## Columns in the Players Data Set:
### Numerical (dbl):
* `played_hours` - The number of hours the user has played.
* `Age`- the age of the player.

### Character (chr):
* `hashedEmail`- encrypted email for user identification.
* `name`- The name of the player.
* `gender`- Classified as either Male, Female, Non-binary, or Prefer not to say
* `experience`- The level of experience classified (Pro, Veteran, Regular, Amatuer, or Beginner)

### Logical (lgl)
* `subscribe`- Wheteher the player is subscribed (either TRUE or FALSE)




## Sessions Dataset Overview:
This dataset tracks player's sessions, such as hashed email, start times and end times (one is human-readable, the other is UNIX timestamped).
* Number of observations (rows): 1535
* Number of variables (columns): 5


## Columns in the Sessions Data Set:

### Character (chr):
* `hashedEmail`- encrypted and anonymous user identification. 
* `start_time`- Session start time (DD/MM/YYYY Hour:Minute)
* `end_time`- Session end time (DD/MM/YYYY Hour:Minute)

### Numerical (dbl):
* `original_start_time`- UNIX timestamp of session start time.
* `original_end_time`- UNIX timestamp of session end time.


# Methods and Results
* Describe the methods you used to perform your analysis from beginning to end that narrates the analysis code

In [None]:
library(tidyverse)
library(tidymodels)
set.seed(123)

In [None]:
url<- "https://drive.google.com/uc?export=download&id=1jv3p3Ai0a1pNS-hk7csk1I9l21fYy-_t"
players<-read_csv(url)
head(players)

In [None]:
mean_values<- players|>
summarise(
    mean_played_hours= mean(played_hours, na.rm=TRUE),
    mean_age= mean(Age, na.rm=TRUE),)
mean_values

Summary_statistics_players<- players|>
summarise(
    min_played_hours= min(played_hours, na.rm=TRUE),
    min_age= min(Age, na.rm=TRUE),
    max_played_hours= max(played_hours, na.rm=TRUE),
    max_age= max(Age, na.rm=TRUE))
Summary_statistics_players



In [None]:
players_clean<- players|>
drop_na(Age, played_hours)
head(players_clean)


player_filtered<- filter(players_clean, Age<30, played_hours>0 & played_hours<10)
head(player_filtered)

Minecraft_plot<- player_filtered|>
ggplot(aes(x=Age, y=played_hours))+
geom_point()+
xlab("Age of Players")+
ylab("Hours Played Minecraft")+
theme(text=element_text(size=13))+
ggtitle("Relationship Between Minecraft User Age and Hours Played")
Minecraft_plot



In [None]:
url<- "https://drive.google.com/uc?export=download&id=1QRsHcWfUyvOWJpKgwFCw3csjRN5NL1L3"
sessions<- read_csv(url)
head(sessions)

## Discussion:
* summarize what you found
* discuss whether this is what you expected to find?
* discuss what impact could such findings have?
* discuss what future questions could this lead to?
