# Social Computing/Social Gaming - Summer 2022

# Exercise Sheet 3: Collaborative Filtering with Steam Games

In this exercise, we will build a collaborative filtering recommender system using data we gather from Steam. We will use your friends list to get information about owned games for each ID, and the time each game was played.

Usually, collaborative filtering is based on some sort of rating to determine the similarity between users. However, for games, the enjoyment and a rating do not always match. Additionally, only about 10% of players actually rate the games they play, which would make for a very incomplete dataset. Therefore, the playtime will be used instead of a rating system. This has the added benefit that playtime is usually the most authentic metric of enjoyment, as players are very unlikely to spend much time on a game they don't enjoy.

## Task 3.1: Obtaining the data


**1.** Your first task is to **gather the data** needed to create the recommender system. **Create a data structure** that holds the needed information for each player and game. To do this, **open the URL** with the given `request()` function, **read** the json response and retrieve your games library and playtime. Then **save** the games into a dictionary with `key=name` and `values=playtime`. **Do not add** games with 0 playtime to this dictionary.


**Notes:** 
- You have three different options to solve this exercise. You can either:
    - Use your own Steam profile (strongly recommended)
    - Use the provided default Steam account (in case you do not own a Steam profile)
    - Use the provided .json file (in case you do not have a Steam profile and the default Steam account becomes overcrowded)
- your choice will not affect your grade in any way
- You cannot obtain a list from your profile with the Steam API unless your profile is set to public. 
- Upon executing the code below, you will notice that a lot of profiles "`couldnt decode`". These are private or deleted profiles and it is totally fine to get this message.


**Hints**:
- In case you wish to use your own Steam profile, but are afraid to share your personal [key](https://steamcommunity.com/dev/apikey) [1] and id, please be informed that you can delete them **after** solving the tasks and before submitting your solutions. The outputs will be saved in the Jupyter Notebook.
- To obtain the games a user owns, use this: `games = data['response']['games']`. This returns a list of games, including the playtime (in minutes) which can be retrieved like this: `playtime = game['playtime_forever']`, where game refers to an item from the list of games. 

Execute the following code cell to install the needed library for this exercise.

In [1]:
!pip install mlxtend



In [1]:
# Use this if you want to work with the default IDs
import requests
import urllib
import pandas as pd
import json
from urllib.request import Request, urlopen
from pandas.io.json import json_normalize
from requests.exceptions import HTTPError

# You can replace these values with your own ID and API key
key = "CB35B8F8DCE9135DDAA3B0328FCE0103"
id = "76561198329838242"
url = "http://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key="+key+"&steamids="+id
r = requests.get(url)
data = r.json()

# Get friendslist
# This is just a template. In order to get your personalized list, you need to change the id and key above.
request = Request("http://api.steampowered.com/ISteamUser/GetFriendList/v0001/?key="+key+"&steamid="+id+"&relationship=friend")
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
friendslist = data['friendslist']
friends = friendslist['friends']

# Get all friends
friendids = []
tempIDs = []
for friend in friends:
    friendids.append(friend['steamid'])
    
print(len(friendids), "ok")

# Get friends of friends
x = 0

while x < len(friendids):
    friendID = friendids[x]
    request = Request("http://api.steampowered.com/ISteamUser/GetFriendList/v0001/?key="+key+"&steamid="+friendID+"&relationship=friend")
    try:
        response = urlopen(request)    
    except urllib.error.HTTPError as e:
        print('401')
    elevations = response.read()
    try:
        data = json.loads(elevations)
    except json.JSONDecodeError:
        print("couldn't decode")
    friendslist = data['friendslist']
    friends = friendslist['friends']

    friendidsNew = []
    for friend in friends:
        friendidsNew.append(friend['steamid'])
        
    tempIDs += friendidsNew
    x += 1

friendids += tempIDs
friendids = list(dict.fromkeys(friendids))
friendids = list(set(friendids))
print(len(friendids))


58 ok
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
401
couldn't decode
5227


In [2]:
# Trim the list of IDs to reasonable values:
if len(friendids)>250:
    friendids = friendids[:250]
print(len(friendids))

users_gamedicts = {} # The dictionary containing all information for every ID

gamedict = {} # A dict containing information for one player

# Get owned games of friendslist:
request = Request("http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key="+key+"&steamid="+id+"&include_appinfo=1&format=json")
# TODO:
# Open the URL, read the json response and retrieve your games library and playtime
response = urlopen(request)
data = json.loads(response.read())
# Save the games into a dictionary with key=name and values=playtime
for value in data['response']['games']:
    if value['playtime_forever'] > 0:
        gamedict[value['name']] = value['playtime_forever']
print(gamedict)
# Hint 1: You can obtain the games a user owns with data['response']['games']
# Hint 2: You can retrieve their playtime with game['playtime_forever']

# Add the dictionary to the users_gamedict
users_gamedicts[id] = gamedict

# Do the same for your friends and their friends
for friendID in friendids:
    request = Request("http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key="+key+"&steamid="+friendID+"&include_appinfo=1&format=json")
    response = urlopen(request)
    data = json.loads(response.read())
    gamedict_new = {}
    if 'games' in data['response'].keys():
        for value in data['response']['games']:
            if value['playtime_forever'] > 0:
                gamedict_new[value['name']] = value['playtime_forever']
        print(gamedict_new)
        users_gamedicts[friendID] = gamedict_new

250
{'Stronghold Kingdoms': 5299, 'Path of Exile': 16896, 'Europa Universalis IV': 167277, 'Titan Quest Anniversary Edition': 13323, 'Black Desert': 3697, 'Crusader Kings II': 6046}
{'Counter-Strike: Global Offensive': 1, 'Dota Underlords': 32}
{}
{'Left 4 Dead': 265, 'Dead Space': 477, 'The Last Remnant': 143, 'Command and Conquer: Red Alert 3 - Uprising': 17, 'Global Agenda': 3865, "Recettear: An Item Shop's Tale": 1223, 'Dragon Age: Origins - Ultimate Edition': 3976, 'Divinity II - The Dragon Knight Saga': 40, 'Magicka': 335, 'BIT.TRIP RUNNER': 58, 'Warhammer 40,000: Dawn of War II - Retribution': 224, 'Terraria': 10602, 'Bastion': 4117, 'Dungeon Defenders': 3964, 'Orcs Must Die!': 917, 'Avernum: Escape From the Pit': 5, 'Warlock - Master of the Arcane': 28, 'Crysis 2 Maximum Edition': 396, 'Psychonauts': 13, 'LIMBO': 357, 'Amnesia: The Dark Descent': 212, 'Superbrothers: Sword & Sworcery EP': 40, 'Braid': 46, 'Super Meat Boy': 12, "Lone Survivor: The Director's Cut": 107, 'DARK SOU

## Task 3.2: Association rule mining

Before we start with the "real" recommender system, let us take a look at a more general form of recommending items using association rules.

The concept of association rule mining is rather simple: Looking at an itemset, one tries to find dependencies between items that could "belong together". A common example would be buying food at the store: If, for example, meat and salt are bought together often, but meat without salt not that often, it is assumed that there is a connection between those two. For games, if it was found that most of the users who own the demo version of a game also own the full version of that game, it would be a reasonable assumption that these users liked the demo and therefore bought the full version.


Let us first cover the mathematical basis for association rules. The most important metrics used are **support**,  **confidence** and **lift**. The first is defined as the amount of times an item occurs in the itemset divided by the total number of items in the set; the second is defined as the support of a list of items [x,y,...] divided by the support of x. Lift is a measure describing the correlation between items. Written down mathematically:

$$supp(x)= \frac{len(x)}{len(n)}$$

$$conf(x=>y) = \frac{supp(x,y)}{supp(x)}$$

$$lift(x=>y) = \frac{P(x \cap y)}{P(x) * P(y)}$$



It is important to note that support refers to an item or a list of items, while confidence refers to a rule. Also note that a lift of 1 means that x and y occur independently of each other, while a lift greater 1 means a positive correlation.


**1.** Your task here is to first **convert** the dictionary you created into a list of lists as this is the input required for the algorithm to work. Then, **print out** the most frequent items using the `min_support` attribute. Finally, **print out** the association rules and **play around with the threshold value** to get a reasonable amount of rules. 

**Hint:** Play around with the threshold values until you get a reasonable amount (4-30) rows as output.

**2.** **Discuss your results** and try to answer the following questions: 
- What kind of recommendations can be made?
- What does a confidence of 1.0 mean and is it meaningful for recommending games? 
- Can you spot a correlation between the games with the highest support and the rules with the highest confidence? How does this affect the lift?  

**Hint:** There is a high chance that games such as "Counter-Strike: Global Offensive" appear very often, you should at least have two different games in the antecedents and consequents column to make meaningful conclusions.

In [5]:
gamesofallusers = []

# TODO 1: Convert the gamedict to a list of lists:
for user in users_gamedicts.values():
    user_games_list=[]
    for user_game in user.keys():
        user_games_list.append(user_game)
    gamesofallusers.append(user_games_list)
print(gamesofallusers)
# It should look something like this:
'''
[
    [
    'Path of Exile',
    'Europa Universalis IV',
    'Titan Quest Anniversary Edition',
    'Black Desert Online',
    'Crusader Kings II'
    ],
    [
    'Counter-Strike',
    'Day of Defeat',
    'Deathmatch Classic',
    'Ricochet'
    ]
]
''' 
# Each list within this list represents the games of one user
    
    
# Remove common Steam entries that are not games:
for game in gamesofallusers:
    if 'Dota 2 Test' in game:
        game.remove('Dota 2 Test')
    if 'True Sight' in game:
        game.remove('True Sight')
    if 'True Sight: Episode 1' in game:
        game.remove('True Sight: Episode 1')
    if 'True Sight: Episode 2' in game:
        game.remove('True Sight: Episode 2')
    if 'True Sight: Episode 3' in game:
        game.remove('True Sight: Episode 3')
    if 'True Sight: The Kiev Major Grand Finals' in game:
        game.remove('True Sight: The Kiev Major Grand Finals')
    if 'True Sight: The International 2017' in game:
        game.remove('True Sight: The International 2017')
    if 'True Sight: The International 2018 Finals' in game:
        game.remove('True Sight: The International 2018 Finals')
        

[['Stronghold Kingdoms', 'Path of Exile', 'Europa Universalis IV', 'Titan Quest Anniversary Edition', 'Black Desert', 'Crusader Kings II'], ['Counter-Strike: Global Offensive', 'Dota Underlords'], [], ['Left 4 Dead', 'Dead Space', 'The Last Remnant', 'Command and Conquer: Red Alert 3 - Uprising', 'Global Agenda', "Recettear: An Item Shop's Tale", 'Dragon Age: Origins - Ultimate Edition', 'Divinity II - The Dragon Knight Saga', 'Magicka', 'BIT.TRIP RUNNER', 'Warhammer 40,000: Dawn of War II - Retribution', 'Terraria', 'Bastion', 'Dungeon Defenders', 'Orcs Must Die!', 'Avernum: Escape From the Pit', 'Warlock - Master of the Arcane', 'Crysis 2 Maximum Edition', 'Psychonauts', 'LIMBO', 'Amnesia: The Dark Descent', 'Superbrothers: Sword & Sworcery EP', 'Braid', 'Super Meat Boy', "Lone Survivor: The Director's Cut", 'DARK SOULS™: Prepare To Die Edition', 'Awesomenauts', 'FTL: Faster Than Light', 'Castle Crashers', 'Torchlight II', 'Worms Revolution', 'Chivalry: Medieval Warfare', 'Darksiders

In [11]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

te = TransactionEncoder()
# TODO 2: Tinker around with the values
te_ary = te.fit(gamesofallusers).transform(gamesofallusers)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.4, use_colnames=True)

frequent_itemsets

Unnamed: 0,support,itemsets
0,0.407895,(Among Us)
1,0.407895,(Apex Legends)
2,0.789474,(Counter-Strike: Global Offensive)
3,0.486842,(PUBG: BATTLEGROUNDS)
4,0.421053,(Warframe)
5,0.407895,"(Apex Legends, Counter-Strike: Global Offensive)"
6,0.486842,"(PUBG: BATTLEGROUNDS, Counter-Strike: Global O..."
7,0.421053,"(Counter-Strike: Global Offensive, Warframe)"


In [12]:
from mlxtend.frequent_patterns import association_rules

# TODO 2: Play around with the treshold value
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Apex Legends),(Counter-Strike: Global Offensive),0.407895,0.789474,0.407895,1.0,1.266667,0.085873,inf
1,(Counter-Strike: Global Offensive),(Apex Legends),0.789474,0.407895,0.407895,0.516667,1.266667,0.085873,1.225045
2,(PUBG: BATTLEGROUNDS),(Counter-Strike: Global Offensive),0.486842,0.789474,0.486842,1.0,1.266667,0.102493,inf
3,(Counter-Strike: Global Offensive),(PUBG: BATTLEGROUNDS),0.789474,0.486842,0.486842,0.616667,1.266667,0.102493,1.338673
4,(Counter-Strike: Global Offensive),(Warframe),0.789474,0.421053,0.421053,0.533333,1.266667,0.088643,1.240602
5,(Warframe),(Counter-Strike: Global Offensive),0.421053,0.789474,0.421053,1.0,1.266667,0.088643,inf


**TODO 2: Write your observations here**
1. First we can observe that from the confidence association rules the highest recommendation for FPS or action games like PUBG, Paladins, Path of Exile, Terreria ... is counter-strike: Global offensive with confidence >= 0.8
2. Second counter-strike: Global offensive is in the other hand not the best recommendation for players who play those because the confidence is like less then 0.6
3. Confidence with value 1 can show the user players both of games. In our case PUBG -> Counter-Strike: Global has confidence 1 but not the opposite. It means if you find a player who plays PUBG it means directly he plays or played Counter-Strike.
4. In our case Counter-Strike:Global Offensive has the highest support value, and it is at the same time the most recommended game for players. This makes actually sense because the highest support means the game is played or can be playable by most of the players. So, it has higher chance of being recommended and a consequent of most of the antecedents.
5. the lift is generally used to measure how much often the andecedents and consequents appear. The rule with highest confidence need not have high lift ( rules with Counter-Strike:global offensive have less lift the others with less confidence)

## Task 3.3: The Recommender System: Similarity Score


Finally, it is time to build the recommender system. 

**1.** The first thing to do is to **implement a similarity score** that will be used to predict a user's playtime of an unowned game. We implement a similarity score between two users by taking the relative distance between two players. We use the following formula:

$$d(u, v) = \sum_{i~\in~common~games} \frac{|r_{u,i} - r_{v,i}|}{r_{v,i}}$$ 

Where $u$ and $v$ are users and $r_{u,i}$ is the playtime of user $u$ for game $i$. 

You can then return the similarity with  
$$ w_{u,v} = \frac{1}{1 + d(u, v)} $$

**Notes:** 
- If no common games exist return 0.

**a) Implement similarity scores:** Besides the given similarity score, we want to explore how other measurements behave. Hence, we will implement the euclidean distance and cosine similarity. The scores can be selected by setting the respective variable on `True`.

In [13]:
from math import sqrt
    
def calculate_similarity(user1ID, user2ID, given=True, euclidean=False, cosine=False):
    common_games = []
    user1games = users_gamedicts[user1ID]
    user2games = users_gamedicts.get(user2ID,user1games)
    common_games = list(set(user1games).intersection(user2games))
    if len(common_games) == 0:
        return 0
    differences = list(set(user1games).difference(user2games))
    if euclidean:
        d = 0
        for game in common_games:
            d+= (user1games[game] - user2games[game]) * (user1games[game] - user2games[game])
        return  1 / (1 + sqrt(d))
    if given:
        d = 0
        for game in common_games:
            d += abs(user1games[game] - user2games[game]) / user2games[game]
        return 1 / 1 + d
    if cosine:
        sumuv = 0
        sumsqr1 = 0
        sumsqr2 = 0
        for game in common_games:
            sumuv+= user1games[game] * user2games[game]
            sumsqr1+= user1games[game] * user1games[game]
            sumsqr2+= user2games[game] * user2games[game]
        return sumuv / (sqrt(sumsqr1) * sqrt(sumsqr2))
    return 0
    
    # TODO: Calculate the similarity score between two friends based on their common games:

## Task 3.4: Recommender System: Predict ratings

With the similarity score calculated, we can now predict a user's playtime for games they don't own.

**1.** First, we **create a set of all games**, but we **delete** all games that are owned by less than 3 players. The reason is simple: If only 1 or 2 players own a game, it is impossible to derive a meaningful prediction since there is not enough data. 

The predicted playtime for a game works analogous to the predicted rating of a movie/item in a conventional collaborative filtering recommender system:

$$r_{u,i} = \frac{\sum_{v \in N_i(u)} w_{u,v}r_{v,i}}{\sum_{v \in N_i(u)} w_{u,v}}$$

where 
- $r_{u,i}$ is the estimated recommendation of item $i$ for target user $u$. 
- $N_i(u)$ is the set of similar users to target user $u$ for the designated item $i$. 
- $w_{u,v}$ is the similarity score between users $u$ and $v$ (used as a weighting factor).  

**Notes:** 
- In our case, we use playtime as a recommendation measure and the set $N_i(u)$ consists of user $u$ friends list and friends of friends list. In our scenario, we do not need the index $i$ as our friends list does not change between games.
- Keep in mind that we have already taken out the games with a playtime of 0. In this case, they are considered "unowned" and not taken into account in this exercise.

In [14]:
# List of all games that are owned by at least 1 person
allGames = []
for user in gamesofallusers:
    for game in user:
        allGames.append(game)

# TODO : Create a list of games owned by at least 3 people

gamesOwnedByThreeAl = [element for element in list(set(allGames)) if allGames.count(element) >=3]
print('Number of unique games played by >=3 ', len(gamesOwnedByThreeAl))
# TODO: Find out which games you do not own out of all games because we are only interested in recommendations for games that we do not own
def difference(allGames, yourGames):
    return [element for element in allGames if element not in yourGames]


# TODO: Predict ratings based on the formula above for each unowned game
# use 'given', 'euclidean' and 'cosine' to switch between measurements
def predict_ratings(given=True, euclidean=False, cosine=False):
    similarity_scores = {}
    notownedgames = difference(gamesOwnedByThreeAl, users_gamedicts[id])
    for game in notownedgames:
        sum_numerator=0
        sum_denomerator=0
        for user in users_gamedicts:
            if user != id and game in users_gamedicts[user].keys():
                similarity = calculate_similarity(id, user,given, euclidean, cosine)
                sum_numerator += similarity * users_gamedicts[user][game]
                sum_denomerator += similarity
        if sum_denomerator != 0:
            similarity_scores[game] = sum_numerator / sum_denomerator
    return similarity_scores

    '''Hint: Iterate over all unowned games and for each game calculate a rating based
        on your friends playtime and similarity score '''

Number of unique games played by >=3  667


## Task 3.5: Recommender System: Discussion

**1.** **Sort** the predicted ratings by estimated playtime (highest first) and **print out** the top 8 predictions for you (or the default user if you are using the default ID). 

**2.** **Discuss** the difference in recommendations between the collaborative filtering approach and the association rule approach. Would you consider one more accurate than the other? Why/why not?

**3.** **Discuss** the differences in the similarity scores.

In [20]:
# TODO:
ratings = predict_ratings(True, False, False)
ratings = sorted(ratings.items(), key= lambda kv:(kv[1], kv[0]), reverse=True)
ratings[:8]

[('Counter-Strike: Global Offensive', 81626.53919165889),
 ('Cookie Clicker', 70862.14335553562),
 ('War Thunder', 66512.33924125918),
 ("Tom Clancy's Rainbow Six Siege", 57304.83746800814),
 ('DRAGON BALL FighterZ', 31976.4996858439),
 ('Arma 3', 24919.213598241873),
 ('Warframe', 24361.654614791976),
 ('Rust', 22929.1188943186)]

In [21]:
# The Collaborative filtering and the association rules are not that much similar. Counter-strike has been recommended by our recommender system because it has the highest support. But considering collaborative filtering is more accurate then than the other would be correct because the recommender system (collaborative filtering) uses the playtime of game besides the association rules are based on the general whole dataset we have been given and not only for specific user.

In [22]:
# The Similarity Scores are different, giving the GIVEN way is not like cosinus or euclidian distance. Each method has its own geometrical way of distance calculation. Recommendation can differ from a way to another. For counter-strike for example we see that it is the most recommended game for this user that is because most of his friends play it and his games are similar to it and considering ofcourse the playtimes . The less the games are recommended the less players own it in the friends list.


**TODO: Write your observations here**
Top Recommendations: Counter-Strike, Cookie Clicker, War Thunder, Tom Clancy's Rainbow Six Siege,DRAGON BALL FighterZ
Collaborative filtering is more accurate the association rules because it takes the playtimes into consideration and picks up top n recommended games.
The associative rules generate recommendation based on the whole data and all users not only one specific user.

## References

[1] https://steamcommunity.com/dev/apikey