In [1]:
import pandas as pd
import altair as alt

### Data Description <br>
players (players.csv) is a dataset containing a list of all unique players, including data about each player. There are 196 observations and 9 variables. Issues that I can see in the data is there are 2 useless variables; 'individualId', and 'organizationName'. They are useless because they have only missing values. The data in this dataset was collected by from a register form that each individual who signed up to play Plaincraft filled out. <br>
    *Description of Each Variable:* <br>
>experience (object): the experience that each individual has had on Minecraft <br>
    subscribe (boolean): Whether or not the individual agreed to receive emails<br>
    hashedEmail (object): encrypted email address<br>
    played_hours (numerical -> float64): Total hours played by each individual<br>
    name (object): selected name in the registration portion<br>
    gender (object): sex of individual<br>
    age (numerical -> int64): age of individual<br>
    individualId (numerical -> float64): ID of each individual<br>
    organizationName (numerical -> float64): <br>

sessions (sessions.csv) is a dataset containing a list of individual play sessions by each player, including data about the session. There are 1535 observations and 5 variables. Issues that I can see in the data is in the values in the original_start_time and original_end_time are identical. However, I suspect this is because the numbers are so large, that is just appears there is no differences in the values. The data in this dataset was collected by tracking the time and activity of each play session. <br>
    *Description of Each Variable:* <br>
>hashedEmail (object): encripted email address<br>
    start_time (object): date and time of start of play session<br>
    end_time (object): date and time of end of play session<br>
    original_start_time (float64): <br>
    original_end_time (float64): <br

### Question <br>


We would like to know which "kinds" of players are most likely to contribute a large amount of data so that we can target those players in our recruiting efforts.

*Response Variable:* 
*Explanatory Variables:* 

### Exploratory Data Analysis and Visualization <br>

In [16]:
players = pd.read_csv('data/players.csv')
sessions = pd.read_csv('data/sessions.csv')
players

Unnamed: 0,experience,subscribe,hashedEmail,played_hours,name,gender,age,individualId,organizationName
0,Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6...,30.3,Morgan,Male,9,,
1,Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa9397...,3.8,Christian,Male,17,,
2,Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3...,0.0,Blake,Male,17,,
3,Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4f...,0.7,Flora,Female,21,,
4,Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb...,0.1,Kylie,Male,21,,
...,...,...,...,...,...,...,...,...,...
191,Amateur,True,b6e9e593b9ec51c5e335457341c324c34a2239531e1890...,0.0,Bailey,Female,17,,
192,Veteran,False,71453e425f07d10da4fa2b349c83e73ccdf0fb3312f778...,0.3,Pascal,Male,22,,
193,Amateur,False,d572f391d452b76ea2d7e5e53a3d38bfd7499c7399db29...,0.0,Dylan,Prefer not to say,17,,
194,Amateur,False,f19e136ddde68f365afc860c725ccff54307dedd13968e...,2.3,Harlow,Male,17,,


In [57]:
players_sort = players[['experience', 'played_hours']]\
    .groupby('experience')\
    .sum()\
    .reset_index()

    
players_sort

Unnamed: 0,experience,played_hours
0,Amateur,379.1
1,Beginner,43.7
2,Pro,36.4
3,Regular,655.5
4,Veteran,31.1


In [62]:


hours_experience = alt.Chart(players_sort).mark_bar().encode(
    x = alt.X('experience').title('Experience Playing Minecraft'),
    y = alt.Y('played_hours').title('Time Played (hours)'),
    color = alt.Color('experience')
)
hours_experience


**Reflection on Graph (1):** <br>
    It is not safe to assume that individuals who have the highest amount of experience with Minecraft outside of PlaiCraft, continue to play the most Minecraft on Plaicraft. The plot shows that individuals who consider themselves an 'Amateur' or 'Regular' play the most Minecraft on PlaiCraft.

In [68]:
players_sort1 = players[['gender', 'played_hours']]\
    .groupby('gender')\
    .sum()\
    .reset_index()


hours_gender = alt.Chart(players_sort1).mark_bar().encode(
    x = alt.X('gender').title('Gender'),
    y = alt.Y('played_hours').title('Time Played (hours)'),
    color = alt.Color('gender'),
)
hours_gender


**Reflection on Graph (2):** 
Individual whose gender is either male and female have played the most total hours of Minecraft on PlaiCraft.


In [22]:
sessions

Unnamed: 0,hashedEmail,start_time,end_time,original_start_time,original_end_time
0,bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431...,30/06/2024 18:12,30/06/2024 18:24,1.719770e+12,1.719770e+12
1,36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f5...,17/06/2024 23:33,17/06/2024 23:46,1.718670e+12,1.718670e+12
2,f8f5477f5a2e53616ae37421b1c660b971192bd8ff77e3...,25/07/2024 17:34,25/07/2024 17:57,1.721930e+12,1.721930e+12
3,bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431...,25/07/2024 03:22,25/07/2024 03:58,1.721880e+12,1.721880e+12
4,36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f5...,25/05/2024 16:01,25/05/2024 16:12,1.716650e+12,1.716650e+12
...,...,...,...,...,...
1530,36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f5...,10/05/2024 23:01,10/05/2024 23:07,1.715380e+12,1.715380e+12
1531,7a4686586d290c67179275c7c3dfb4ea02f4d317d9ee0e...,01/07/2024 04:08,01/07/2024 04:19,1.719810e+12,1.719810e+12
1532,fd6563a4e0f6f4273580e5fedbd8dda64990447aea5a33...,28/07/2024 15:36,28/07/2024 15:57,1.722180e+12,1.722180e+12
1533,fd6563a4e0f6f4273580e5fedbd8dda64990447aea5a33...,25/07/2024 06:15,25/07/2024 06:22,1.721890e+12,1.721890e+12


In [24]:
players.dtypes , sessions.dtypes

(experience           object
 subscribe              bool
 hashedEmail          object
 played_hours        float64
 name                 object
 gender               object
 age                   int64
 individualId        float64
 organizationName    float64
 dtype: object,
 hashedEmail             object
 start_time              object
 end_time                object
 original_start_time    float64
 original_end_time      float64
 dtype: object)

### Methods and Plan