# Assignment 2: Web API and Data Storage

In this homework, you will work closely with data from a popular game called Dota 2. Here is an excerpt from wikipedia about the game. 


>Dota 2 is a free-to-play multiplayer online battle arena (MOBA) video game developed and published by Valve Corporation. The game is the stand-alone sequel to Defense of the Ancients (DotA), which was a community-created mod for Blizzard Entertainment's Warcraft III: Reign of Chaos and its expansion pack, The Frozen Throne. Dota 2 is played in matches between two teams that consist of five players, with both teams occupying their own separate base on the map. Each of the ten players independently control a powerful character, known as a "hero", that each feature unique abilities and different styles of play. During a match, a player and their team collects experience points and items for their heroes in order to fight through the opposing team's defenses. A team wins by being the first to destroy a large structure located in the opposing team's base, called the "Ancient".



## Part 1: Accessing the data

First you will need to get an API key from Valve from here https://steamcommunity.com/dev/apikey. Once you have gotten an API key, you can start requesting data. I recommend using a Python wrapper for it https://dota2api.readthedocs.io/en/latest/index.html. Follow the installation guide on the page. To test your installation, try running the following code.

```python
import dota2api
apikey = ""
api = dota2api.Initialise(apikey)
match = api.get_match_details(match_id=1000193456)
```

In [1]:
# Fill code here
import dota2api
import json

with open('secret.json', 'r') as json_data:
    d = json.load(json_data)

apikey = d['key']
api = dota2api.Initialise(apikey)
match = api.get_match_details(match_id=1000193456)

In [2]:
match

{u'barracks_status_dire': 63,
 u'barracks_status_radiant': 63,
 u'cluster': 133,
 'cluster_name': u'Europe West',
 u'dire_captain': 87278757,
 u'dire_logo': 543025270456493033,
 u'dire_name': u'Team Secret',
 u'dire_score': 0,
 u'dire_team_complete': 1,
 u'dire_team_id': 1838315,
 u'duration': 1964,
 u'engine': 0,
 u'first_blood_time': 124,
 u'flags': 0,
 u'game_mode': 16,
 'game_mode_name': u'Captains Draft',
 u'human_players': 10,
 u'leagueid': 2140,
 'lobby_name': u'Practice',
 u'lobby_type': 1,
 u'match_id': 1000193456,
 u'match_seq_num': 895207201,
 u'negative_votes': 17,
 u'picks_bans': [{u'hero_id': 12, u'is_pick': False, u'order': 0, u'team': 1},
  {u'hero_id': 91, u'is_pick': False, u'order': 1, u'team': 0},
  {u'hero_id': 15, u'is_pick': False, u'order': 2, u'team': 1},
  {u'hero_id': 78, u'is_pick': False, u'order': 3, u'team': 0},
  {u'hero_id': 30, u'is_pick': False, u'order': 4, u'team': 1},
  {u'hero_id': 7, u'is_pick': False, u'order': 5, u'team': 0},
  {u'hero_id': 69,

In [4]:
# Expected output for the match

{u'barracks_status_dire': 63,
 u'barracks_status_radiant': 63,
 u'cluster': 133,
 'cluster_name': u'Europe West',
 u'dire_captain': 87278757,
 u'dire_logo': 543025270456493033,
 u'dire_name': u'Team Secret',
 u'dire_score': 0,
 u'dire_team_complete': 1,
 u'dire_team_id': 1838315,
 u'duration': 1964,
 u'engine': 0,
 u'first_blood_time': 124,
 u'flags': 0,
 u'game_mode': 16,
 'game_mode_name': u'Captains Draft',
 u'human_players': 10,
 u'leagueid': 2140,
 'lobby_name': u'Practice',
 u'lobby_type': 1,
 u'match_id': 1000193456,
 u'match_seq_num': 895207201,
 u'negative_votes': 17,
 u'picks_bans': [{u'hero_id': 12, u'is_pick': False, u'order': 0, u'team': 1},
  {u'hero_id': 91, u'is_pick': False, u'order': 1, u'team': 0},
  {u'hero_id': 15, u'is_pick': False, u'order': 2, u'team': 1},
  {u'hero_id': 78, u'is_pick': False, u'order': 3, u'team': 0},
  {u'hero_id': 30, u'is_pick': False, u'order': 4, u'team': 1},
  {u'hero_id': 7, u'is_pick': False, u'order': 5, u'team': 0},
  {u'hero_id': 69,

## Part 2: Crawling and storing the data

In this part, you will have to collect data from 1000 Dota2 matches and storing the data for later use. 

I left it up to you on how to collect the 1000 match data. Here is a bit of advice. Don't make too many requests at one time you will risk getting banned. To be safe, add a little bit of delay (sleep) between consecutive requests.

After you have obtained the data save it as a Pickle file or a Sqlite DB.

In [3]:
# Fill code here

# Getting match ids from dotabuff 
# https://www.dotabuff.com/esports/matches
from bs4 import BeautifulSoup
import re
import requests
import time
import numpy as np

base_url = 'https://www.dotabuff.com/esports/matches'
headers = {'user-agent': 'Mozilla/5.0'}

In [4]:
# get 20 match ids at a time
# return list of string unicode
def get_match_ids(url, headers):
    lst = []
    pattern = r'^(\/matches\/)'
    r = requests.get(url=base_url, headers=headers)
    soup = BeautifulSoup(r.text, 'lxml')
    for items in soup.find_all('a', href=re.compile(pattern)):
        for match_id in items:
            lst.append(match_id.string.strip())
    return lst

In [5]:
n = 50
match_ids = []

for i, page in enumerate(range(1, n+1)):
    tmp_match_ids = get_match_ids(url=base_url+"?page="+str(page), headers=headers)
    match_ids = match_ids + tmp_match_ids
    if (i % 10) == 0:
        time.sleep(5)

In [6]:
match_ids = np.array(match_ids).astype(int)
matches = []

for i, match_id in enumerate(match_ids):
    match = api.get_match_details(match_id=match_id)
    matches.append(match)
    if (i % 100) == 0:
        time.sleep(5)
    
print "-"*30
print "Done calling dota2api"
print "-"*30

------------------------------
Done calling dota2api
------------------------------


In [7]:
# storing data into pickle file
import pickle

data = {'matches': matches}

with open('matches.p', 'wb') as f:
    pickle.dump(matches, f)

## Part 3:  Loading up data and analysis

We begin this part by loading up the data you stored in a pickle file earlier.


In [11]:
# Fill code here

### 3.1 Find average win rate for each faction

Your answer should be numerical values

In [12]:
# Fill code here

### 3.2 Game duration

Plot a histogram to show distribution of game duration. Also mark on the plot: the **mean** and **median**.

In [13]:
# Fill code here

### 3.3 First-blood time

Similar to previous part, plot a histogram to show distribution of first-blood time. Also mark on the plot: the **mean** and **median**.

### 3.4 Hero popularity and win rate

Make a stacked horizontal barchart to display the number of games played by each hero and how many of them are victories. You must sort the bar so that the most popular hero appears on top.

In [14]:
# Fill code here

### 3.5  Kills/Death/Assist

Make a grouped horizonal barchart to display: total kills, total deaths, total assists for each hero on the same axis.

In [15]:
# Fill code here

### 3.6 Richest Heros

Find out the top 10 heros with highest Gold per Minute.

In [16]:
# Fill code here

### 3.6 Popular Items (overall)

Find out the top 10 items that most frequently used overall.

In [17]:
# Fill code here

### 3.7 Popular Items (among winners)

Find out the top 10 items that most frequently used among the winning team. 

In [18]:
# Fill code here

### 3.8 Hero and items

Pick 3 heros of your choice. Find out the most commonly purchased items for them. Are the items similar or different between the victors and losers? You can figure out this by finding the items for victors and for losers separately and compare them.

In [19]:
# Fill code here

### 3.9 Your pick!

Come up with a question of your own and find the answer from the dataset.

STATE YOUR QUESTION HERE

In [20]:
# Fill code here