# Analysis of top 1000 HiveMC Bedwars players

### Content
+ Introduction: HiveMC and Bedwars
+ Data description and objectives
+ Data acquisiton, manipulation and validation
+ Data analysis and visualization 
+ Conclusion

## 1. Introduction: HiveMC and Bedwars

The HiveMC is an official Minecraft Java Edition Server. Minecraft is a sandbox video game developed by Mojang Studios. The game was created by Markus "Notch" Persson in the Java programming language and released as a public alpha for personal computers in 2009 before officially releasing in November 2011, with Jens Bergensten taking over development. Currently, Minecraft Java Edition surpasses 30 million lifetime sales. 

HiveMC or The Hive is a Minigames server owned by Hive Games Limited. It was first registered in 2018. Focuses on providing players with different fun games such as SkyWars, Bedwars, Hide and Seek and DeathRun. Over 12 million unique minecraft accounts visited the server at least once. 

Bedwars is strategy and player vs. player based minigame where you must protect your bed whilst trying to eliminate your opponents on islands in the sky. You can continue to respawn while your bed is safe. If your bed is destroyed, you can no longer respawn and you are eliminated once you die. Before being first released on multiplayer servers it was a custom game map, developed at 2012, to play with a company of friends. At the time was called as "Rush" and was developed by the man named Xisuma. Became very popular once was released on GommeHD, German minigames server, and name was changed to "Bedwars". 

As of 2020, the Bedwars is one of the most popular gamemodes among Minecraft minigames. Average unique player count per day for this gamemode is over 10000 players. I also play bedwars and by this date I'm on 492 place among all 3.6 millions The Hive Bedwars players.

This Project will analyze best 1000 Bedwars players of The Hive. 

Sources:
+ https://en.wikipedia.org/wiki/Minecraft
+ https://hivemc.com/bedwars

## 2. Data description and objectives

The data consits of the players statistics - namely, the indicators of player's progress in the game. In this analysis we are interested in comparing those statistics among different players. Besides that, further in the project we will make comparison between countries from which players originated. 

Our analysis will be based on data of 01.10.2020 obtained from HiveMC API(https://api.hivemc.com/). We will aditionally scrap the information about the country of the player. Below is the main attributes of data that will be obtained, scraped and used for our analysis:
+ Place - rank of the player based on game points
+ Points - number of game points* player got
+ Games played - number of games that a player participated in
+ Victories - number of wins player has
+ Kills - total amount of other player kills subject got
+ Deaths - total number of times player died in game
+ Beds - number of beds player by himself destroyed
+ Team eliminated - number of other teams eliminated
+ Winstreak - number of current wins in a row
+ Country - origin country of the player

For this project, data analysis and visualization contains 5 parts:
1. Analyze kill/death ratio, win/loss ratio of top 1000 players
2. Analyze relationship between variables and place of the player in the data set
3. Analyze relationship between winstreaks and other variables
4. Propose new ranking model based on average points per game
5. Analyze geographical tendancies among players' origin countries


*- Points can be optained by killing players(5 points), breaking beds (50-80 points), upgrading generators (5-15 points)

## 3. Data acquisiton, manipulation and validation

### 3.1. Data acquisiton: Use of HiveMC API and scrap from NameMC website
We are going to use Beatifulsoup package to get the data from HiveMC API and NameMC. In order to get our data we need to do 5 API requests and scrap from 1000 pages. The algorithm works as following: first we get data of first 200 players  in the top from HiveAPI in form of JSON string and then transformed to object form. Using the Username data we then procceed to first 200 pages of NameMC to scrap the data about the country of the player. Once obtained all of the data will be written to CSV file for further use. Repeat the proccess for the rest 4 API requests. Webscrap took about 4 hours.

Source of webscrap: (https://ru.namemc.com/)

In [15]:
#import all the needed packages
import requests
from bs4 import BeautifulSoup
import json
import time
import csv
import numpy as np

Next part of the code will not run since it takes to much time and rewrites the existing file. You can run it yourself if you would like to do that. Code also included in the GitHub repo.

In [59]:
#Now in the pandas form:
import pandas as pd

table = pd.read_csv('hive_players.csv')
table[:10]

Unnamed: 0,Place,Username,Points,Victories,Games,Kills,Deaths,Beds,Teams Eliminated,Winstreak,Country
0,1,StrafeYosef,6197520,15683,22251,152661,19118,22776,22080,1,Israel
1,2,prinsese1,5859040,16208,19877,95197,35171,16855,20461,0,
2,3,Lcea,5224840,14517,20880,127275,44404,16311,13458,4,Germany
3,4,HappyStateOfMind,4812445,13889,18162,79314,35140,6719,6660,28,Norway
4,5,Taivax,4392645,10855,11433,91210,12997,25977,23505,80,United Kingdom
5,6,Dragasdata,4286425,11087,15368,80086,28333,26353,20198,2,
6,7,xoLarry2Pro,4285105,10884,13333,95093,23854,14255,12560,24,France
7,8,Foony,4255195,11193,11382,78566,12488,14652,14969,420,Netherlands
8,9,CRYBL0CKER,4241915,11438,14361,90960,25247,13085,13814,29,South Africa
9,10,Kylic,4231945,11016,14150,101989,36658,14707,13627,29,United Kingdom


### 3.2. Data manipulation: cleaning and shaping
At this step we need to reshape our data a bit and assign appropriate column names.
+ We will replace missing values by NaN
+ Change type of some columns from string to int
+ Convert some values
+ Determine how many values are missing

In [60]:
#Missed values are given by "None" so we need to replace them
table.replace("None", np.nan, inplace = True)

# some columns with numeric values should be converted to int type
table = table.astype({"Points": "int", "Victories": "int", "Kills": "int", "Deaths": "int", "Beds": "int", "Teams Eliminated": "int", "Winstreak": "int"})
table.head(5)

Unnamed: 0,Place,Username,Points,Victories,Games,Kills,Deaths,Beds,Teams Eliminated,Winstreak,Country
0,1,StrafeYosef,6197520,15683,22251,152661,19118,22776,22080,1,Israel
1,2,prinsese1,5859040,16208,19877,95197,35171,16855,20461,0,
2,3,Lcea,5224840,14517,20880,127275,44404,16311,13458,4,Germany
3,4,HappyStateOfMind,4812445,13889,18162,79314,35140,6719,6660,28,Norway
4,5,Taivax,4392645,10855,11433,91210,12997,25977,23505,80,United Kingdom


In [61]:
table.tail(5)

Unnamed: 0,Place,Username,Points,Victories,Games,Kills,Deaths,Beds,Teams Eliminated,Winstreak,Country
995,996,ThunderGamingX,510225,1134,2513,12852,3238,3702,3184,3,
996,997,Kuadro,510135,1332,3428,10317,7530,1254,1023,0,
997,998,swordfish09304,509375,1347,2727,11513,8708,1377,1151,2,
998,999,LaMinecraftienne,509185,1434,2265,11413,5499,2792,1831,2,
999,1000,0hSora,508290,1112,2626,12072,11361,1818,1529,0,


In [62]:
#Number of points is pretty large, more than 500 thousands for each player
#We need to convert this column, lets divide by thousand and round to 1 decimal place 
table["Points"] = table["Points"].div(1000).round(1)
table.head(5)

Unnamed: 0,Place,Username,Points,Victories,Games,Kills,Deaths,Beds,Teams Eliminated,Winstreak,Country
0,1,StrafeYosef,6197.5,15683,22251,152661,19118,22776,22080,1,Israel
1,2,prinsese1,5859.0,16208,19877,95197,35171,16855,20461,0,
2,3,Lcea,5224.8,14517,20880,127275,44404,16311,13458,4,Germany
3,4,HappyStateOfMind,4812.4,13889,18162,79314,35140,6719,6660,28,Norway
4,5,Taivax,4392.6,10855,11433,91210,12997,25977,23505,80,United Kingdom


In [63]:
#Now we need to rename Points column so that it represents correct info
table.rename(columns = {'Points' : 'Points (in thousands)'}, inplace = True)
table.head(5)

Unnamed: 0,Place,Username,Points \n(in thousands),Victories,Games,Kills,Deaths,Beds,Teams Eliminated,Winstreak,Country
0,1,StrafeYosef,6197.5,15683,22251,152661,19118,22776,22080,1,Israel
1,2,prinsese1,5859.0,16208,19877,95197,35171,16855,20461,0,
2,3,Lcea,5224.8,14517,20880,127275,44404,16311,13458,4,Germany
3,4,HappyStateOfMind,4812.4,13889,18162,79314,35140,6719,6660,28,Norway
4,5,Taivax,4392.6,10855,11433,91210,12997,25977,23505,80,United Kingdom


In [57]:
#Lets determine how many values are missing
table_validation = pd.DataFrame()
table_validation["Columns"] = list(table.columns)
table_validation["Count"] = list(table.count())
table_validation[:]

Unnamed: 0,Columns,Count
0,Place,1000
1,Username,1000
2,Points (thousands),1000
3,Victories,1000
4,Games,1000
5,Kills,1000
6,Deaths,1000
7,Beds,1000
8,Teams Eliminated,1000
9,Winstreak,1000


So the only missing values are the countries of some players, which is, due to some players not being registered to NameMC or not sharing their information about the country. Those entries with no data about country takes the dominant proportion. Further in geological analysis we will not use them.