# Steam-Hype

Reference: https://github.com/woctezuma/steam-hype

## Load data

### Steam

Reference: https://store.steampowered.com/search/?filter=popularwishlist

#### Download

In [11]:
import requests

def get_steam_url(page_no=1):
  url = 'https://store.steampowered.com/search/?filter=popularwishlist&page={}'
  return url.format(page_no)

def download_steam(num_pages=10):
  text_aggregate = ''

  # We download 10 pages with 25 games per page, for a total of 250 games.
  for page_no in range(num_pages):
    r = requests.get(url=get_steam_url(1+page_no))
    if r.ok:
      text_aggregate += r.text
  
  return text_aggregate

#### Save to disk

In [12]:
from pathlib import Path

def get_steam_filename():
  return 'steam.txt'

def save_steam_to_disk():
  if not Path(get_steam_filename()).exists():
    text_aggregate = download_steam()
    if len(text_aggregate)>0:
      with open(get_steam_filename(), 'w') as f:
        f.write(text_aggregate)
  return

In [13]:
save_steam_to_disk()

#### Parse

In [14]:
def filter_steam_document(lines):
  return [l for l in lines if 'data-ds-appid' in l]

def parse_steam_app_id(line):
  element  = next(e for e in line.split() if 'data-ds-appid' in e)
  return int(element.split('"')[1])

def load_steam_ranking():
  with open(get_steam_filename(), 'r') as f:
    d = f.readlines()
  steam_ranking = [parse_steam_app_id(l) for l in filter_steam_document(d)]
  print('[Steam] #apps = {}'.format(len(steam_ranking)))
  return steam_ranking

In [15]:
steam_ranking = load_steam_ranking()

[Steam] #apps = 250


#### Check

In [16]:
for i in range(10):
  print('{:03d}) {}'.format(25*i+1, steam_ranking[25*i]))

001) 1091500
026) 668580
051) 1293160
076) 844980
101) 1220140
126) 1043810
151) 1341050
176) 1015890
201) 1120320
226) 429380


### SteamDB

Reference: https://steamdb.info/upcoming/?hype

#### Download

In [17]:
import requests

def get_steamdb_url():
  return 'https://steamdb.info/upcoming/?hype'

def get_headers():
  # To avoid status code 403 ("Forbidden"):
  return {'user-agent': 'my-app/0.0.1'}

def download_steamdb():
  text_aggregate = ''
  r = requests.get(url=get_steamdb_url(),
                   headers=get_headers())
  if r.ok:
    text_aggregate = r.text    
  return text_aggregate

#### Save to disk

In [18]:
from pathlib import Path

def get_steamdb_filename():
  return 'steamdb.txt'

def save_steamdb_to_disk():
  if not Path(get_steamdb_filename()).exists():
    text_aggregate = download_steamdb()
    if len(text_aggregate)>0:
      with open(get_steamdb_filename(), 'w') as f:
        f.write(text_aggregate)
  return

In [19]:
save_steamdb_to_disk()

#### Parse

In [20]:
def filter_steamdb_document(lines):
  return [l for l in lines if l.startswith('<a href="/app/')]

def parse_steamdb_app_id(line):
  return int(line.split('/')[2])

def load_steamdb_ranking():
  with open('steamdb.txt', 'r') as f:
    d = f.readlines()
  steamdb_ranking = [parse_steamdb_app_id(l) for l in filter_steamdb_document(d)]
  print('[SteamDB] #apps = {}'.format(len(steamdb_ranking)))
  return steamdb_ranking

In [21]:
steamdb_ranking = load_steamdb_ranking()

[SteamDB] #apps = 250


#### Check

In [22]:
for i in range(10):
  print('{:03d}) {}'.format(25*i+1, steamdb_ranking[25*i]))

001) 1091500
026) 1363080
051) 907650
076) 657580
101) 1325200
126) 1040200
151) 1105670
176) 750200
201) 1279240
226) 685560


## Rank-biased overlap

Paper:
> WEBBER, William, MOFFAT, Alistair, et ZOBEL, Justin. **A similarity measure for indefinite rankings**. ACM Transactions on Information Systems (TOIS), 2010, vol. 28, no 4, p. 1-38. [PDF](http://w.codalism.com/research/papers/wmz10_tois.pdf)

Code: https://github.com/dlukes/rbo

### Install

In [23]:
!git clone https://github.com/dlukes/rbo.git
%cp rbo/rbo.py .

Cloning into 'rbo'...
remote: Enumerating objects: 7, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 42 (delta 1), reused 5 (delta 1), pack-reused 35[K
Unpacking objects: 100% (42/42), done.


### Run

In [31]:
from rbo import rbo

rbo_output = rbo(steam_ranking,
                 steamdb_ranking,
                 p=0.999)

rbo_lower_bound = rbo_output.min
rbo_residual = rbo_output.res
rbo_estimate = rbo_output.ext

print('Rank-biased overlap estimate: {:.4f}'.format(rbo_estimate))

Rank-biased overlap estimate: 0.7231


In [32]:
from rbo import average_overlap

reference_overlap = average_overlap(steam_ranking,
                                    steamdb_ranking)

print('Average overlap = {:.4f}'.format(reference_overlap))

Average overlap = 0.6802


## Rank-order correlation coefficients

Reference: https://en.wikipedia.org/wiki/Rank_correlation

### Install

In [33]:
%pip install scipy



### Convert rankings for scipy

In [35]:
app_ids = sorted(set(steam_ranking).union(steamdb_ranking))
print('#appIDs = {}'.format(len(app_ids)))

#appIDs = 316


In [62]:
import numpy as np

def convert_ranking(ranking, app_ids, reverse_order=False):
  v = np.zeros(len(app_ids)).astype(np.int)

  # Initialize v
  out_of_bound_rank = len(ranking) + 1
  v += out_of_bound_rank

  for rank, app_id in enumerate(ranking, start=1):
    v[app_ids.index(app_id)] = rank

  if reverse_order:
    # It is important the the ranks are non-negative for weighted tau!
    v = out_of_bound_rank - v + 1

  return v

In [63]:
# Examples
ranking = ["a", "c", "b", "d"]
references = 'abcd'

converted_ranking = convert_ranking(ranking, references, reverse_order=False)
print({k:v for (k,v) in zip(references, converted_ranking)})

converted_ranking_rev = convert_ranking(ranking, references, reverse_order=True)
print({k:v for (k,v) in zip(references, converted_ranking_rev)})

{'a': 1, 'b': 3, 'c': 2, 'd': 4}
{'a': 5, 'b': 3, 'c': 4, 'd': 2}


### Spearman rho coefficient

Reference: http://scipy.github.io/devdocs/generated/scipy.stats.spearmanr.html#scipy.stats.spearmanr

In [119]:
from scipy import stats

# TODO: investigate what this does if the rankings are fed this way.
rho, p_value = stats.spearmanr(steam_ranking, 
                               steamdb_ranking)

print('Spearman rank-order correlation coefficient: {:.4f}'.format(rho))
print('p-value to test for non-correlation: {:.4f}'.format(p_value))

Spearman rank-order correlation coefficient: -0.0662
p-value to test for non-correlation: 0.2968


In [103]:
# Arrays of rankings
# NB: we would get the same rho with arrays of scores.

x = convert_ranking(steam_ranking, 
                    app_ids=app_ids)

y = convert_ranking(steamdb_ranking, 
                    app_ids=app_ids)

In [104]:
from scipy import stats

rho, p_value = stats.spearmanr(x, y)

print('Spearman rank-order correlation coefficient: {:.4f}'.format(rho))
print('p-value to test for non-correlation: {:.4f}'.format(p_value))

Spearman rank-order correlation coefficient: 0.4367
p-value to test for non-correlation: 0.0000


### Kendall's tau coefficient

Reference: http://scipy.github.io/devdocs/generated/scipy.stats.kendalltau.html#scipy.stats.kendalltau

In [118]:
from scipy import stats

# TODO: investigate what this does if the rankings are fed this way.
tau, p_value = stats.kendalltau(steam_ranking,
                                steamdb_ranking)

print('Kendall rank-order correlation coefficient: {:.4f}'.format(tau))
print('p-value to test for non-correlation: {:.4f}'.format(p_value))

Kendall rank-order correlation coefficient: -0.0441
p-value to test for non-correlation: 0.2988


In [93]:
# Arrays of rankings
# NB: we would get the same tau with arrays of scores.

x = convert_ranking(steam_ranking, 
                    app_ids=app_ids)

y = convert_ranking(steamdb_ranking, 
                    app_ids=app_ids)

In [94]:
from scipy import stats

tau, p_value = stats.kendalltau(x, y)

print('Kendall rank-order correlation coefficient: {:.4f}'.format(tau))
print('p-value to test for non-correlation: {:.4f}'.format(p_value))

Kendall rank-order correlation coefficient: 0.3295
p-value to test for non-correlation: 0.0000


### Weighted Kendall's tau coefficien

Reference: http://scipy.github.io/devdocs/generated/scipy.stats.weightedtau.html#scipy.stats.weightedtau

In [101]:
# Arrays of scores
# NB: it is important NOT to feed arrays of rankings for the weighted tau!

x = convert_ranking(steam_ranking, 
                    app_ids=app_ids, 
                    reverse_order=True)

y = convert_ranking(steamdb_ranking, 
                    app_ids=app_ids, 
                    reverse_order=True)

# **Caveat**: the order is reversed because of the following comment in the doc:
#
# > Note that if you are computing the weighted on arrays of ranks, rather than
# > of scores (i.e., a larger value implies a lower rank) you must negate the 
# > ranks, so that elements of higher rank are associated with a larger value.
#
# Reference: http://scipy.github.io/devdocs/generated/scipy.stats.weightedtau.html#scipy.stats.weightedtau

In [97]:
from scipy import stats

weighted_tau, p_value = stats.weightedtau(x, y)

print('Weighted Kendall rank-order correlation coefficient: {:.4f}'.format(weighted_tau))
print('p-value to test for non-correlation: {:.4f}'.format(p_value))

Weighted Kendall rank-order correlation coefficient: 0.6404
p-value to test for non-correlation: nan
