<div class="div_image pull-right">
    <div class = "image image_topic pull-right">
        <img src = "https://i.imgur.com/EGtMXKh.jpg?1">
    </div>
</div>

# <b>Capstone Project: Predicting Dota 2 Match Wins using Machine Learning and Recommender System (Webscraping)</b>


---

# <b>Background</b>

*Esports*

Esports, short for electronic sports has been a growing sport around the world and viewed by many around the world. In 2017, the International Olympic Committee (IOC) recognised the growing popularity of esports and concluded that "Competitve esports could be considered as a sporting activity, and the players involved prepare and train with an intensity which may be comparable to athletes in traditional sports." As the legitimacy of esports grows, many players around the world form teams and compete in high stakes tournaments with huge prize pools, similar to traditional sports.

*Dota2*

One popular esports game is Dota 2, a multiplayer online battle arena (MOBA) video game developed by Valve. Dota 2 is played in matches between two teams of five players, with each team occupying and defending their own separate base on the map. Each of the ten players independently controls a powerful character known as a "hero" that all have unique abilities and differing styles of play. During a match players collect experience points and items for their heroes to successfully defeat the opposing team's heroes in player versus player combat. A team wins by being the first to destroy the other team's "Ancient", a large structure located within their base. Throughout the lifespan of the game, new heroes and items are constantly added or balanced over various game patches, which keeps the dynamics of the games in a constant flux.

Since 2011, The International(TI) is the largest world championship tournament for Dota 2 that has been held annually in different countries(with the exception of 2020 due to COVID-19 pandemic). The prize pool is crowdfunded and has increased from US$1.6m in 2011 to US$40m in 2021, which indicates the interest and popularity of the competition. As of this project, the 2022 TI competition is scheduled to take place in Singapore.

Given the huge following and the stable competitive scene, it has seen a rise in interest for predictive modelling and analysis. Hence, this project will seek to create a predictive model to predict match wins before the start of the match by only looking at hero selection. This project will also seek to build a recommender system to recommend the highest winning probability given the selection of heroes chosen.


----

# Importing Libraries


In [1]:
# Import Libraries
import requests
import pandas as pd
import time
from tqdm import tqdm
import json


# Warnings Suppressions
import warnings
warnings.filterwarnings('ignore')


In [2]:
# Set Pandas Options:

pd.set_option("display.max_columns", 500)
pd.set_option("display.max_rows", 500)
pd.set_option("display.max_colwidth", 100)


# <b>Data Collection</b>


The match data will be scraped from https://docs.opendota.com/, which is an API that parses matches from Steam. We will be scraping ~500,000 match data from Patch 7.3.1d. This will ensure that there will be no balance changes that will affect the quality of our data.


In [3]:
# Webscrape Function

match_dict = {}

url = "https://api.opendota.com/api/publicMatches"


def webscrape(match_id):
    for i in tqdm(range(1, 10001)): #Each pull gives you 100 matches
        params = {'less_than_match_id': match_id}
        res = requests.get(url, params)
        data = res.json()
        time.sleep(0.8)
        # to continue scraping from last match id
        try:
            match_id = data[-1]['match_id']
            df_name = str(i)
            match_dict[df_name] = pd.DataFrame(
            data)
        except (KeyError):
            pass 



## Webscraping


In [4]:
# Webscrape the data
webscrape(6659248211)

100%|██████████| 10000/10000 [3:19:45<00:00,  1.20s/it] 


In [5]:
# Putting it into a dataframe
dota2_matches = pd.concat(match_dict.values(), ignore_index=True)


In [6]:
dota2_matches.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 14 columns):
 #   Column         Non-Null Count    Dtype  
---  ------         --------------    -----  
 0   match_id       1000000 non-null  int64  
 1   match_seq_num  1000000 non-null  int64  
 2   radiant_win    1000000 non-null  bool   
 3   start_time     1000000 non-null  int64  
 4   duration       1000000 non-null  int64  
 5   avg_mmr        669449 non-null   float64
 6   num_mmr        669449 non-null   float64
 7   lobby_type     1000000 non-null  int64  
 8   game_mode      1000000 non-null  int64  
 9   avg_rank_tier  1000000 non-null  int64  
 10  num_rank_tier  1000000 non-null  int64  
 11  cluster        1000000 non-null  int64  
 12  radiant_team   1000000 non-null  object 
 13  dire_team      1000000 non-null  object 
dtypes: bool(1), float64(2), int64(9), object(2)
memory usage: 100.1+ MB


In [8]:
hero_url = 'https://api.opendota.com/api/heroes'

hero_res = requests.get(hero_url)

hero_data = hero_res.json()

In [9]:
hero_data

[{'id': 1,
  'name': 'npc_dota_hero_antimage',
  'localized_name': 'Anti-Mage',
  'primary_attr': 'agi',
  'attack_type': 'Melee',
  'roles': ['Carry', 'Escape', 'Nuker'],
  'legs': 2},
 {'id': 2,
  'name': 'npc_dota_hero_axe',
  'localized_name': 'Axe',
  'primary_attr': 'str',
  'attack_type': 'Melee',
  'roles': ['Initiator', 'Durable', 'Disabler', 'Jungler', 'Carry'],
  'legs': 2},
 {'id': 3,
  'name': 'npc_dota_hero_bane',
  'localized_name': 'Bane',
  'primary_attr': 'int',
  'attack_type': 'Ranged',
  'roles': ['Support', 'Disabler', 'Nuker', 'Durable'],
  'legs': 4},
 {'id': 4,
  'name': 'npc_dota_hero_bloodseeker',
  'localized_name': 'Bloodseeker',
  'primary_attr': 'agi',
  'attack_type': 'Melee',
  'roles': ['Carry', 'Disabler', 'Jungler', 'Nuker', 'Initiator'],
  'legs': 2},
 {'id': 5,
  'name': 'npc_dota_hero_crystal_maiden',
  'localized_name': 'Crystal Maiden',
  'primary_attr': 'int',
  'attack_type': 'Ranged',
  'roles': ['Support', 'Disabler', 'Nuker', 'Jungler'],
  

---

## Data Dictionary

| Name          | data_type | Value(s)           |Description                                                                                                          |
| ------------- | --------- | ------------------ | -------------------------------------------------------------------------------------------------------------------- |
| match_id      | int       | -                  | Match Identification Number                                                                                          |
| match_seq_num | int       | -                  | Match Sequence Number                                                                                                |
| radiant_win   | Boolean   | True/ False        | True: Radiant Team Wins False: Radiant Team Loses                                                                    |
| start_time    | int       | -                  | Start Time of Match                                                                                                  |
| duration      | int       | -                  | Match Duration                                                                                                       |
| avg_mmr       | float     | -                   | Average MMR of players in match. A higher value indicates more skilled players                                       |
| num_mmr       | float     | -                   | Number of players with have their MMR public.                                                                        |
| lobby_type    | int       | 0, 7               | Type of match. 0: Normal Match 7: Ranked Match                                                                       |
| game_mode     | int       | 2, 3, 4, 5, 16, 22 | Type of game. 2: Captain's Mode 3: Random Draft 4: Single Draft 5: All Random 16: Captain's Draft 22: All Draft |
| avg_rank_tier | int       | -                   | Average Rank Tier of players in match. First number indicates the rank tier, second number indicates number of stars |
| num_rank_tier | int       | -                   | Number of players in the match that have their rank profile public                                                   |     |
| cluster       | int       | -                  | Server cluster where match take place                                                                                |
| radiant_team  | str       | -                  | String of Hero_Ids of radiant team                                                                                   |
| dire_team     | str       | -                  | String of Hero_Ids of dire team                                                                                      |


## Exporting to CSV File


In [7]:
# Export to csv file
dota2_matches.to_csv("dota2_dataset_uncleaned.csv", index=False)


In [11]:
# Export heros dictionary to JSON file

with open('heroes.json', 'w') as outfile:
    json.dump(hero_data, outfile, ensure_ascii = False)