# Scraping Speedrun.com Minecraft: Java Edition 1.16+ Random Seed Glitchless 

### Data set description
The following dataset is web scraped from https://www.speedrun.com/mc, Minecraft: Java Edition 1.16+ Random Seed Glitchless. It represents Worlds best approved speedruns in Minecraft: Java for the category mentioned before. Data set consists of 1100+ players and theirs best runs for this category.

<b>Important: </b>To run this webscrap you need to get chromedriver for your version of chrome from here: https://chromedriver.chromium.org/downloads, put the executable file to the same package as your notebook.

Main attributes that we will get from webscrap: 
+ Rank, rank of the player based on the time he got for his run
+ Player, Username of the player
+ Real Time, the time in which speedrun was completed
+ In-game time, basically, time the runner got excluding times the game was frozen due to technical issues
+ Version, version of the minecraft 
+ Difficulty, difficulty of the game,
+ F3, a hotkey used to display various information for the game
+ Mods, mods/modpacks used by the player
+ Date, time since the record was approved

In [4]:
#importing all of the necessary packages
import requests
from bs4 import BeautifulSoup
import time
import csv #we will use that to save our data to csv file

In [5]:
from selenium import webdriver #since the website is dynamical we will use selenium and the chrome driver to get html code
from selenium.webdriver.chrome.options import Options
import os

# Instantiate an Options object
# and add the “ — headless” argument
opts = Options()
opts.add_argument(" — headless")
chrome_driver = os.getcwd() + "\\chromedriver.exe"

driver = webdriver.Chrome(options=opts, executable_path=chrome_driver)
opts.binary_location= os.getcwd() +'\\GoogleChromePortable\GoogleChromePortable.exe;'

driver.get("https://www.speedrun.com/mc/full_game#Any_Glitchless")
start_time = time.time() 
time.sleep(2) #setting delay
# Put the page source into a variable and create a BS object from it
soup_file=driver.page_source
soup = BeautifulSoup(soup_file)


driver.quit() 
print('Execution time:', round(time.time() - start_time, 3), 's') 
table = soup.find_all('table', class_ = 'reverse-padding-sides reverse-padding-bottom') #we will take the table we need


Execution time: 7.726 s


In [6]:
#now we will start writing to the csv file 
with open('mc_sr.csv', mode='w') as mc_sr :
    fieldnames = ['Rank', 'Player', 'Real time' , 'In-game time', 'Version', 'Difficulty', 'F3', 'Mods', 'Date'] #attributes
    player = csv.DictWriter(mc_sr, fieldnames=fieldnames)
    player.writeheader()
    
    rows = table[0].find_all('tr') #these are the rows
    
    for i in rows[1:] : #skip first row as we dont need it
        cells = i.find_all('td')
        
        #here is all the attributes encoded as there is used non utf-8 character 
        rank = cells[0].text.encode('cp1252', errors='ignore')
        username = cells[1].text[0:int(len(cells[1].text)/2)].encode('cp1252', errors='ignore')
        real_time = cells[2].text.encode('cp1252', errors='ignore')
        ingame_time = cells[3].text.encode('cp1252', errors='ignore')
        version = cells[5].text.encode('cp1252', errors='ignore')
        difficulty = cells[6].text.encode('cp1252', errors='ignore')
        f3 = cells[7].text.encode('cp1252', errors='ignore')
        mods = cells[8].text.encode('cp1252', errors='ignore')
        date = cells[9].text.encode('cp1252', errors='ignore')
        
        
        player.writerow({'Rank': rank.decode('utf-8'),
                        'Player' : username.decode('utf-8', errors = 'ignore'), #as I found, it was in the username list
                        'Real time' : real_time.decode('utf-8'),
                        'In-game time' : ingame_time.decode('utf-8'),
                        'Version' : version.decode('utf-8'),
                        'Difficulty' : difficulty.decode('utf-8'),
                        'F3' : f3.decode('utf-8'),
                        'Mods' : mods.decode('utf-8'),
                        'Date' : date.decode('utf-8')})


In [7]:
import pandas as pd #to display dataset we got in pandas
ds = pd.read_csv('mc_sr.csv')
ds[:]

Unnamed: 0,Rank,Player,Real time,In-game time,Version,Difficulty,F3,Mods,Date
0,1st,Couriway,14m 39s 520ms,14m 36s 500ms,1.16.1,Normal,F3,JellySquid,2 months ago
1,2nd,Korbanoes,15m 12s 970ms,14m 56s 800ms,1.16.1,Normal,F3,Vanilla,3 months ago
2,3rd,Dowsky,15m 51s 040ms,15m 44s 100ms,1.16.1,Easy,F3,JellySquid,4 weeks ago
3,4th,zylenox,16m 01s 020ms,15m 54s 283ms,1.16.1,Easy,F3,JellySquid,1 week ago
4,5th,Dylqn,17m 02s,16m 41s,1.16.1,Easy,F3,JellySquid,1 month ago
...,...,...,...,...,...,...,...,...,...
1122,1123rd,Mumu_Didi,5h 22m 00s,5h 22m 00s,1.16.1,Normal,F3,JellySquid,5 months ago
1123,1124th,ProfessorBiggy,6h 15m 31s,6h 14m 24s 023ms,1.16.2,Normal,F3,Vanilla,4 months ago
1124,1125th,Ristain,10h 24m 51s 018ms,10h 22m 14s,1.16.3,Normal,F3,Vanilla,2 months ago
1125,1126th,Myles_Away,21h 11m 48s 030ms,21h 12m 00s,1.16.1,Normal,F3,Vanilla,1 month ago
