# Building a Boardgame Recommender System for BoardGameGeek users

## Part 3: Web app deployment and online evaluations
*by Timothy Tan*

---

> In this final part of our trilogy, we will be building a web app using flask, HTML and CSS. The app will be hosted on an AWS EC2 instance, which due to our large pickled file sizes, has to be upgraded to one with 16GB ram and 4 virtual CPUs. The ultimate aim is to deploy a functioning app that is able to generate recommendations based on 3 different algorithms, SVD with 50 latent factors (SVD50), Non-negative matrix factorization with Weighted Alternating Least Squares (ALS) and Cosine Similarity.

### Developing the flask app

> In the flask environment, we have a controller.py file, which contains the routes and logic flow from one HTML file to the next. We also store our recommendation functions for each algorithm in there for quick processing and delivery. Our predictions and item-item similarity matrix are passed in as pickled files and are stored directly into the EC2 instance. The app's files are stored in a github repository where they are synced automatically between the AWS servers and Github's via a bash script utilizing crontabs.

> The flow for a potential respondent is as follows: they first enter their username on Boardgamegeek.com into an input text field. The first list of 20 games will be displayed. They will have to rate the list in a binary format, which will take them to the next page where the method used to generate the list is briefly mentioned. Clicking on 'New List' brings them to the next list of 20 games. The lists generated are always in the same sequential flow: SVD50, ALS and finally Cosine Similarity.

> In doing so, I am inadvertently experimenting on live subjects, to see which model churns out lists that best resonate with them, and help them in their research for the next game they should put on their gaming table.

### Analyzing the online evaluation results

In [45]:
import pandas as pd
import numpy as np


In [46]:
#Read comma-separated txt file into a dataframe
data = pd.read_csv('result.txt', sep=",", header=None)
data.columns = ['username', 'algo', 'liked']

In [47]:
data.head()

Unnamed: 0,username,algo,liked
0,passthedynamite,svd,1
1,passthedynamite,als,1
2,passthedynamite,cos,1
3,manueld,svd,1
4,DobbelB,svd,0


In [48]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 565 entries, 0 to 564
Data columns (total 3 columns):
username    565 non-null object
algo        565 non-null object
liked       565 non-null int64
dtypes: int64(1), object(2)
memory usage: 13.3+ KB


In [49]:
#Total number of unique usernames
unique_users = data['username'].nunique()
unique_users

240

In [50]:
dupes = data[data.duplicated(['username', 'algo'], keep='last')]
dupes

Unnamed: 0,username,algo,liked
249,R0land1199,svd,1
310,AndySzy,als,0
371,Pepe potamo,cos,1
415,Poserdisposer,cos,1
420,neoMarcos,svd,1
473,Zepheus,als,1
478,dandp3,cos,0


> Duplicates probably occurred when they press back on the browser after rating and re-rated again. I'll take the latest rating as truth.

In [51]:
#Drop duplicates
data.drop(dupes.index, inplace=True)

In [52]:
data.reset_index(inplace=True, drop=True)

In [53]:
data.shape

(558, 3)

In [54]:
#Groupby username to count the total number of lists each respondent liked.
liked = data.groupby('username')['liked'].sum()

In [55]:
#Number of users that liked at least 1 list
liked_one = liked[liked > 0].count()
liked_one

179

In [56]:
percentage_liked = float(liked_one)/unique_users
percentage_liked

0.7458333333333333

> This value evaluates the usefulness of the app itself. More than 2/3s of respondents can use at least one of the lists to help them find new games that should probably like.

In [57]:
#Using groupby again to count the number of lists each repondent actually rated.
count = data.groupby(['username']).count()

In [58]:
count.head()

Unnamed: 0_level_0,algo,liked
username,Unnamed: 1_level_1,Unnamed: 2_level_1
123ABC,1,1
3davoli,3,3
Abiezer Coppe,3,3
Adamvic,1,1
Albireo,1,1


In [59]:
#Number of unique usernames that rated all 3 lists
all_3 = count[count['algo'] == 3].count()[0]
all_3

142

In [60]:
percentage_all_3 = float(all_3)/unique_users
percentage_all_3

0.5916666666666667

> This value evaluates the UX/UI component of the web app. Slightly more than half of respondents actually managed to see all 3 lists. I had to iterate over the front page to include new instructions to rate all 3 lists before exiting, in a bid to boost users that go to full completion of the 'survey'.

In [61]:
#Store users that rated all 3 as we will be using it to evaluate the models efficacy
df = count[count['algo'] == 3]

In [62]:
df.head()

Unnamed: 0_level_0,algo,liked
username,Unnamed: 1_level_1,Unnamed: 2_level_1
3davoli,3,3
Abiezer Coppe,3,3
Cadila,3,3
Calcapone,3,3
Carthoris,3,3


In [63]:
#Extract indices of the main dataframe where users did not rate all 3 lists. Use these indices to drop those users 
#from the main dataframe
drop_list = []
for name in data['username']:
    if name not in df.index:
        drop_list.extend(np.where(data['username'] == name)[0].tolist())


In [64]:
len(drop_list)

200

In [65]:
df_final = data.drop(drop_list)

In [66]:
df_final.head()

Unnamed: 0,username,algo,liked
0,passthedynamite,svd,1
1,passthedynamite,als,1
2,passthedynamite,cos,1
4,DobbelB,svd,0
5,DobbelB,als,1


In [67]:
#Perform a pivot table to get the results into the shape I need for algorithm comparison
df_pivot = pd.pivot_table(df_final, values='liked', index='username', columns='algo')
df_pivot.head()

algo,als,cos,svd
username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
3davoli,0,1,1
Abiezer Coppe,0,0,0
Cadila,1,0,1
Calcapone,1,1,0
Carthoris,0,0,0


In [68]:
mean = df_pivot.apply(np.mean, axis=0)

In [69]:
#Percentage that liked the lists generated from each model
pd.DataFrame(mean)

Unnamed: 0_level_0,0
algo,Unnamed: 1_level_1
als,0.535211
cos,0.676056
svd,0.570423


> These values indicate the efficacy of each list in helping a BGG user decide on the next game purchase/playthrough. We can observe that while als is slightly more efficient than the SVD50 algorithm, the lists generated through cosine similarity outshines them both. This is highly surprising given it had returned a higher RMSE than either of the other 2 algorithms.

> Users who have rated all 3, gave feedback in a forum on the BGG website that the games recommended in the Cosine Similarity list were the 'most creative' and 'recommended unexpected games to me'. 

> Some other observations based on comment feedback from respondents was that the SVD50 algo had weird inclusions of very low-ranked games as well. I surmise that this could be due to how the algorithm simply creates low-rank reprentation of the ratings matrix which is highly dependent on the number of latent factors chosen. These latent factors are often times unexplainable, rendering matrix factorization methods a bit of a 'black box' when it comes to explaining their recommendations. I suspect that by overspecifying the number of latent factors, the low-ranked games are grouped together with the high-ranked games on some unexplained dimensions, resulting in miscalculations in the predicted ratings.

> In general, the cosine similarity list appears to be the most serendipitous with games some users have seen before but have not taken notice of. This has led to some of them taking a closer look at some of these recommendations which fulfils what this project sought to achieve from the onset.

### Further work
> This app was developed within 4 days and due to the need to deploy it quickly to gather the online evaluations, not all ideas were implemented:
- Storage of the input data should be in a database so as to reduce the storage and processing cost involved in handling pickled files. For reference, the 3 pickled files are 1.7GB each in size.
- I was trying to figure out how to create a final multiple choice question for the user to select the best list out of the 3 but could not do so in time.

> Some potential extensions to this project includes:
- Extending the gamelist beyond the 1807 to include games with less ratings
- Evaluate with rank metrics eg: Mean Average Precision @ K
- Tweak the flask routing and algorithms to be able to access real-time ratings based on API call in order to provide recommendations that are more current.
- Assess a computationally less expensive approach to user-user cosine similarity. User-user similarity helps in generating serendipity and seeing as to how the item-item approach does so well in providing unexpected recommendations, it would be interesting to see how different the user-user approach would be. One way to alleviate the high computational cost would be to perform unsupervised learning first to identify nearest neighbors before calculating similarity scores.
- Explore group recommendations. Board games are mostly played by 2 or more people and a recommender that takes into account the preferences of all parties involved might be well regarded.
- Include content-based filtering to reduce cold start problem for new games.
- Create context aware recommendations, possibly in the form of a chat bot, that instead of recommending new games, recommends you games to bring to the next game night by asking you a series of questions