# Game Recommender System

Simple recommendation engine with Steam's preprocessed dataset.<br />
This implementation uses content-based filtering, which does not consider languge, available platforms, popularity and ratings when recommending games to a user.<br />

### Import relevant libraries

In [2]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

### Helper functions

In [3]:
# function to get game name from given row index
def get_name_from_id(game_id):
    return df[df.id == game_id]["name"].values[0]

In [4]:
# function to get the row index with given game name
def get_id_from_name(name):
    return df[df.name == name]["id"].values[0]

In [5]:
# function to get the game DF with given row index
def get_df_from_id(similar_games):
    return df.loc[df['id'].isin(similar_games)]

In [6]:
# function to sort games DF by number of positive ratings
def sort_by_ratings(input_df):
    return input_df.sort_values(by=['positive_ratings'], ascending=False).values[0]

In [7]:
# function to filter english games
def filter_by_english(input_df):
    return input_df[input_df.english == 1].values[0]

In [8]:
# function to filter games by price range
def filter_by_price(input_df, low_price, high_price):
    return input_df[(input_df.price >= low_price) & (input_df.price <= high_price)].values[0]

### Step 1: Read CSV File

In [9]:
df = pd.read_csv("dataset/steam.csv")

In [12]:
# create anm id column based on rows
df["id"] = df.index
df.head()

Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price,id
0,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19,0
1,20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99,1
2,30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99,2
3,40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99,3
4,50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99,4


### Step 2: Select features
Here I decided to use developer, categories, genres, steamspy_tags as the main features

*Notes: can also merge with steam_description_data.csv and get short_description as one of the features*

In [13]:
features = ['developer', 'categories', 'genres', 'steamspy_tags']

### Step 3: Combine features
Create a column in DataFrame which combines all selected features

In [14]:
def combine_features(row):
    try:
        return row['developer'] + " " + row['categories'] + " " + row['genres'] + " " + row['steamspy_tags']
    except:
        print("Error: ", row)

Apply the function to all the rows in the DataFrame

In [16]:
df['combined_features'] = df.apply(combine_features, axis=1)
df['combined_features'].head()

0    Valve Multi-player;Online Multi-Player;Local M...
1    Valve Multi-player;Online Multi-Player;Local M...
2    Valve Multi-player;Valve Anti-Cheat enabled Ac...
3    Valve Multi-player;Online Multi-Player;Local M...
4    Gearbox Software Single-player;Multi-player;Va...
Name: combined_features, dtype: object

### Step 4: Create count matrix

In [17]:
cv = CountVectorizer()
count_matrix = cv.fit_transform(df['combined_features'])

### Step 5: Compute the Cosine Similarity
Find the cosine similarity between matrix. Let's choose a sample game "Portal" here.

In [19]:
cosine_sim = cosine_similarity(count_matrix)
# use a sample game name
game_user_likes = "Portal"

In [20]:
cosine_sim[0]

array([1.        , 1.        , 0.76409318, ..., 0.39393939, 0.11980846,
       0.11980846])

### Step 6: Get index of the game
Use the above *get_id_from_name* function, we want to get the name of the game given its corresponding id

In [21]:
game_id = get_id_from_name(game_user_likes)
game_id

17

### Step 7: Get similar game list
Here we want to get a list of similar games in descending order of similarity score. To do this, we first create a list of tuples so that all scores appears after its corresponding game id (id column).

For example, we have a list of similarity scores looking like this:<br />
**[1, 0.8, 0.2, 0.5]**<br />

We will transform the above to a list of tuples, so that each score is matched with its corresponding row index. After the transformation, the list to look like:<br />
**[(0,1), (1,0.8), (2,0.2), (3, 0.5)]**<br />
where 0,1,2 are game id, which are followed by their similarity scores.

After that, we sort the list based on their similarity scores, so the example list from above should look like:<br />
**[(0,1), (1,0.8), (3, 0.5), (2,0.2)]**

In [24]:
similar_games = list(enumerate(cosine_sim[game_id]))

Here we sort the list highest similarity score to lowest, and show the first 11 items.<br />
Here we see the first item returned has a similarity score of exactly 1, which represents the chosen sample game itself ("Portal").

In [26]:
sorted_similar_games = sorted(similar_games, key=lambda x:x[1], reverse=True) 
sorted_similar_games[0:11]

[(17, 1.0000000000000007),
 (16, 0.7504787743864563),
 (18, 0.7504787743864563),
 (934, 0.7313103409735261),
 (9, 0.6567360733292693),
 (23, 0.6531972647421809),
 (1974, 0.6158402871356009),
 (20, 0.6048431391510366),
 (1360, 0.5879447357921311),
 (3621, 0.5855168737932973),
 (5026, 0.5819143739626463)]

### Step 8: Print similar games
Now we have a list of sorted similar games, we can print out the first 20 in the list to recommend to the user.<br />
We will use the *get_name_from_id* function to find the corresponding game given its id.<br />
Because we want to avoid printing out the sample game itself ("Portal"), we will start from the second item in the *sorted_similar_games* list.

In [30]:
i = 0
for game in sorted_similar_games:
    if i > 0:
        print(get_name_from_id(game[0]))
    i += 1
    if i > 21:
        break

Half-Life 2: Episode One
Half-Life 2: Episode Two
Amnesia: The Dark Descent
Half-Life 2
Portal 2
Escape Goat
Left 4 Dead
The Witness
The Forgotten Ones
Tadpole Treble
Left 4 Dead 2
Nurbits
Black Mesa
Clustertruck
Bad Rats Show
Entropy Rising
Apollo4x
Half Past Disaster
Boom Island
A Collection of Bad Moments
Arma 2


### Step 9: Save similar game list and other information in a DataFrame
Now we can get a list of names of the similar games, but what if we want to see other information such as developer and price? To get those info back, we can merge the unsorted tuple from Step 7 with the orignal data, then use *get_df_from_id* to get those 20 games that appears in *sorted_similar_games*.

First, we trasnform the *sorted_similar_games* tuples into a Pandas Dataframe, name that as *similar_games_short*. Then, we merge that with the original dataset *df*, name that as *similar_games_full*. This will be our final dataframe that includes all the information we need for further transformation. 

In [31]:
similar_games_short = pd.DataFrame(similar_games, columns=['id','similarity_score'])
similar_games_full = df.merge(similar_games_short, on='id')

# Remove some columns that we don't need for now
del similar_games_full['appid']
similar_games_full.head()

Unnamed: 0,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price,id,combined_features,similarity_score
0,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19,0,Valve Multi-player;Online Multi-Player;Local M...,0.234509
1,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99,1,Valve Multi-player;Online Multi-Player;Local M...,0.234509
2,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99,2,Valve Multi-player;Valve Anti-Cheat enabled Ac...,0.198762
3,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99,3,Valve Multi-player;Online Multi-Player;Local M...,0.234509
4,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99,4,Gearbox Software Single-player;Multi-player;Va...,0.264906


Now we have all the information in one big DataFrame, we can sort the data by *similarity_score* so that it'll be easier for us to print out a DataFrame of the 20 most similar games.

In [39]:
similar_games_full = similar_games_full.nlargest(21, 'similarity_score')
similar_games_20 = similar_games_full.iloc[0:21].reset_index()
similar_games_20

Unnamed: 0,index,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,...,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price,id,combined_features,similarity_score
0,17,Portal,2007-10-10,1,Valve,Valve,windows;mac;linux,0,Single-player;Steam Achievements;Captions avai...,Action,...,15,51801,1080,288,137,10000000-20000000,7.19,17,Valve Single-player;Steam Achievements;Caption...,1.0
1,16,Half-Life 2: Episode One,2006-06-01,1,Valve,Valve,windows;mac;linux,0,Single-player;Steam Achievements;Captions avai...,Action,...,13,7908,517,281,184,5000000-10000000,5.79,16,Valve Single-player;Steam Achievements;Caption...,0.750479
2,18,Half-Life 2: Episode Two,2007-10-10,1,Valve,Valve,windows;mac;linux,0,Single-player;Steam Achievements;Captions avai...,Action,...,22,13902,696,354,301,5000000-10000000,5.79,18,Valve Single-player;Steam Achievements;Caption...,0.750479
3,934,Amnesia: The Dark Descent,2010-09-08,1,Frictional Games,Frictional Games,windows;mac;linux,0,Single-player;Steam Achievements;Full controll...,Action;Adventure;Indie,...,18,20222,1199,173,41,2000000-5000000,14.99,934,Frictional Games Single-player;Steam Achieveme...,0.73131
4,9,Half-Life 2,2004-11-16,1,Valve,Valve,windows;mac;linux,0,Single-player;Steam Achievements;Steam Trading...,Action,...,33,67902,2419,691,402,10000000-20000000,7.19,9,Valve Single-player;Steam Achievements;Steam T...,0.656736
5,23,Portal 2,2011-04-18,1,Valve,Valve,windows;mac;linux,0,Single-player;Co-op;Steam Achievements;Full co...,Action;Adventure,...,51,138220,1891,1102,520,10000000-20000000,7.19,23,Valve Single-player;Co-op;Steam Achievements;F...,0.653197
6,1974,Escape Goat,2013-10-09,1,MagicalTimeBean,MagicalTimeBean,windows;mac;linux,0,Single-player;Steam Achievements;Full controll...,Action;Indie,...,16,566,15,0,0,20000-50000,3.99,1974,MagicalTimeBean Single-player;Steam Achievemen...,0.61584
7,20,Left 4 Dead,2008-11-17,1,Valve,Valve,windows;mac,0,Single-player;Multi-player;Co-op;Steam Achieve...,Action,...,73,17951,948,897,278,5000000-10000000,7.19,20,Valve Single-player;Multi-player;Co-op;Steam A...,0.604843
8,1360,The Witness,2016-01-26,1,"Thekla, Inc.","Thekla, Inc.",windows;mac,0,Single-player;Steam Achievements;Captions avai...,Adventure;Indie,...,2,6572,1282,630,674,500000-1000000,29.99,1360,"Thekla, Inc. Single-player;Steam Achievements;...",0.587945
9,3621,The Forgotten Ones,2014-07-17,1,Bernt Andreas Eide,Bernt Andreas Eide,windows;mac;linux,0,Single-player;Steam Achievements;Captions avai...,Action;Adventure;Free to Play;Indie,...,12,903,601,24,24,500000-1000000,0.0,3621,Bernt Andreas Eide Single-player;Steam Achieve...,0.585517


### Conclusion, Findings and Future Implementations
Here we have printed out the first 20 recommended games for the user based on a game's developer, categories, genres and steam tags. However, when we put "Portal" as the sample game, its next series, "Portal 2", does not show up as the most similar game. This is likely because "Portal 2" is attached with more categories and Steam tags, and there are other games with those features matching "Portal"'s better.

#### Future improvements/experiments could include:
- Give each selected feature a different weight to see how it'll perform with the same game series
- Sort recommendations based on popularity (number of postivie ratings)
- Give user options to filter language and available platforms
- Filter games given a certain price range