## 4. Board Game Recommendation_Board Game Features
## Overview

In order to enhance the recommendation system, few different models was built with different features. The modelling process did takes the result from topic modelling into account and built a recommendation system based on board game mechanic and etc. After combining all the keywords, TF-IDF vectorizer was used to create the feature matrix so that the cosine similarity score can be calculated and used to make recommendations.

In [None]:
# Importing datasets from folder
from google.colab import drive
drive.mount("/content/drive/")

Mounted at /content/drive/


In [None]:
# File location variable 
file = '/content/drive/My Drive/Capstone/'

In [None]:
import pandas as pd
import numpy as np
import nltk
import regex as re
import string

from bs4 import BeautifulSoup  
from nltk.corpus import stopwords 
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer

from scipy import sparse
from sklearn.metrics.pairwise import pairwise_distances,linear_kernel


In [None]:
bg_con = pd.read_csv(file + 'boardgames_content.csv')
bg_con.head(2)

Unnamed: 0,id,name,year,rank,bayes average,users rated,url,type,thumbnail,image,alternate,description,minplayers,maxplayers,suggested_num_players,suggested_playerage,suggested_language_dependence,playingtime,minplaytime,maxplaytime,minage,boardgamecategory,boardgamemechanic,boardgamefamily,boardgameexpansion,boardgameimplementation,boardgamedesigner,boardgameartist,boardgamepublisher,average,Board Game Rank,Strategy Game Rank,Family Game Rank,stddev,median,owned,trading,wanting,wishing,numcomments,numweights,averageweight,boardgameintegration,boardgamecompilation,Party Game Rank,Abstract Game Rank,Thematic Rank,War Game Rank,Customizable Rank,Children's Game Rank,RPG Item Rank,Accessory Rank,Video Game Rank,Amiga Rank,Commodore 64 Rank,Arcade Rank,Atari ST Rank
0,30549,Pandemic,2008,91,7.518,96186,/boardgame/30549/pandemic,boardgame,https://cf.geekdo-images.com/thumb/img/HEKrtpT...,https://cf.geekdo-images.com/original/img/j-pf...,"['EPIZOotic', 'Pandemia', 'Pandemia 10 Anivers...","In Pandemic, several virulent diseases have br...",2,4,"[OrderedDict([('@numplayers', '1'), ('result',...","[OrderedDict([('@value', '2'), ('@numvotes', '...","[OrderedDict([('@level', '16'), ('@value', 'No...",45,45,45,8,['Medical'],"['Action Points', 'Cooperative Game', 'Hand Ma...","['Game: Pandemic', 'Medical: Diseases', 'Occup...",['Pandemic: Gen Con 2016 Promos – Z-Force Team...,"['Pandemic Legacy: Season 0', 'Pandemic Legacy...",['Matt Leacock'],"['Josh Cappel', 'Christian Hanisch', 'Régis Mo...","['Z-Man Games, Inc.', '(Unknown)', 'Albi', 'As...",7.61567,91,104.0,10.0,1.32632,0,144727,2191,640,8571,15778,5232,2.4148,,,,,,,,,,,,,,,
1,822,Carcassonne,2000,173,7.311,96181,/boardgame/822/carcassonne,boardgame,https://cf.geekdo-images.com/thumb/img/kqE4YJS...,https://cf.geekdo-images.com/original/img/o4p6...,"['Carcassonne Jubilee Edition', 'Carcassonne: ...",Carcassonne is a tile-placement game in which ...,2,5,"[OrderedDict([('@numplayers', '1'), ('result',...","[OrderedDict([('@value', '2'), ('@numvotes', '...","[OrderedDict([('@level', '41'), ('@value', 'No...",45,30,45,7,"['City Building', 'Medieval', 'Territory Build...","['Area Majority / Influence', 'Map Addition', ...","['Components: Black meeples', 'Components: Blu...","['20 Jahre Darmstadt Spielt', 'Apothecaries An...","['The Ark of the Covenant', 'Carcassonne: Amaz...",['Klaus-Jürgen Wrede'],"['Doris Matthäus', 'Anne Pätzke', 'Chris Quill...","['Hans im Glück', '999 Games', 'Albi', 'Bard C...",7.41884,173,,34.0,1.30369,0,140066,1587,539,6286,17720,7304,1.9158,['Carcassonne: Wheel of Fortune'],"['Carcassonne Big Box', 'Carcassonne Big Box 2...",,,,,,,,,,,,,


## Recommendation based on Cosine Similarity Score

In [None]:
# Function for recommendations

def recommendations(title, tfidf_matrix):
    # Compute the cosine similarity matrix (getting dot product of the tfidf matrix)
    cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

    # Create a data series with the index is boardgame name
    bg_n = pd.Series(bg_con.index, index=bg_con['name']).drop_duplicates()

    # Get the index of the boardgame that matches the title input
    i = bg_n[title]

    # Get the similarity scores of all boardgame with that searched boardgame
    cs_score = list(enumerate(cosine_sim[i]))

    # Sort the boardgame based on the cosine similarity scores
    cs_score = sorted(cs_score, key=lambda x: x[1], reverse=True)

    # Get the scores of the 5 most similar boardgame
    print('*********************************************************************')
    print(f'Top 5 Similar Boardgames with {title}: ')
    print(' ')
    cs_score = cs_score[1:6]

    # Get the boardgame index
    boardgames = [i[0] for i in cs_score]

    # Return the top 5 most similar boardgame with details
    return bg_con[['name','url']].iloc[boardgames]

## Recommendation based on game description

Recommend board games that have similar game desciption with the search

In [None]:
# Fill na with empty string if any
bg_con['description'] = bg_con['description'].fillna(' ')

In [None]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


True

In [None]:
# Cleaning on the description before vectorizer

def cleaning(text):
    
    text_bs = BeautifulSoup(text).get_text()    # 1. Remove HTML.
    letters = re.sub("[^a-zA-Z]", " ", text_bs)    # 2. Remove non-letters.
    words = letters.lower().split()                      # 3. Convert to lower case, split into individual words.
    stops = set(stopwords.words('english'))
    meaningful_words = [w for w in words if w not in stops]   # 4. Remove stopwords.
    lemmatizer = WordNetLemmatizer()                          # 5. Lemmatize
    meaningful_words_lemmatized = [lemmatizer.lemmatize(w) for w in meaningful_words]   
    
    return(" ".join(meaningful_words_lemmatized))  # 6.Join the words back into one string separated by space,return the result.

In [None]:
text_clean = []
for text in bg_con['description']:
    text_clean.append(cleaning(text))

bg_con['description clean'] = text_clean
bg_con.head(2)

Unnamed: 0,id,name,year,rank,bayes average,users rated,url,type,thumbnail,image,alternate,description,minplayers,maxplayers,suggested_num_players,suggested_playerage,suggested_language_dependence,playingtime,minplaytime,maxplaytime,minage,boardgamecategory,boardgamemechanic,boardgamefamily,boardgameexpansion,boardgameimplementation,boardgamedesigner,boardgameartist,boardgamepublisher,average,Board Game Rank,Strategy Game Rank,Family Game Rank,stddev,median,owned,trading,wanting,wishing,numcomments,numweights,averageweight,boardgameintegration,boardgamecompilation,Party Game Rank,Abstract Game Rank,Thematic Rank,War Game Rank,Customizable Rank,Children's Game Rank,RPG Item Rank,Accessory Rank,Video Game Rank,Amiga Rank,Commodore 64 Rank,Arcade Rank,Atari ST Rank,description clean
0,30549,Pandemic,2008,91,7.518,96186,/boardgame/30549/pandemic,boardgame,https://cf.geekdo-images.com/thumb/img/HEKrtpT...,https://cf.geekdo-images.com/original/img/j-pf...,"['EPIZOotic', 'Pandemia', 'Pandemia 10 Anivers...","In Pandemic, several virulent diseases have br...",2,4,"[OrderedDict([('@numplayers', '1'), ('result',...","[OrderedDict([('@value', '2'), ('@numvotes', '...","[OrderedDict([('@level', '16'), ('@value', 'No...",45,45,45,8,['Medical'],"['Action Points', 'Cooperative Game', 'Hand Ma...","['Game: Pandemic', 'Medical: Diseases', 'Occup...",['Pandemic: Gen Con 2016 Promos – Z-Force Team...,"['Pandemic Legacy: Season 0', 'Pandemic Legacy...",['Matt Leacock'],"['Josh Cappel', 'Christian Hanisch', 'Régis Mo...","['Z-Man Games, Inc.', '(Unknown)', 'Albi', 'As...",7.61567,91,104.0,10.0,1.32632,0,144727,2191,640,8571,15778,5232,2.4148,,,,,,,,,,,,,,,,pandemic several virulent disease broken simul...
1,822,Carcassonne,2000,173,7.311,96181,/boardgame/822/carcassonne,boardgame,https://cf.geekdo-images.com/thumb/img/kqE4YJS...,https://cf.geekdo-images.com/original/img/o4p6...,"['Carcassonne Jubilee Edition', 'Carcassonne: ...",Carcassonne is a tile-placement game in which ...,2,5,"[OrderedDict([('@numplayers', '1'), ('result',...","[OrderedDict([('@value', '2'), ('@numvotes', '...","[OrderedDict([('@level', '41'), ('@value', 'No...",45,30,45,7,"['City Building', 'Medieval', 'Territory Build...","['Area Majority / Influence', 'Map Addition', ...","['Components: Black meeples', 'Components: Blu...","['20 Jahre Darmstadt Spielt', 'Apothecaries An...","['The Ark of the Covenant', 'Carcassonne: Amaz...",['Klaus-Jürgen Wrede'],"['Doris Matthäus', 'Anne Pätzke', 'Chris Quill...","['Hans im Glück', '999 Games', 'Albi', 'Bard C...",7.41884,173,,34.0,1.30369,0,140066,1587,539,6286,17720,7304,1.9158,['Carcassonne: Wheel of Fortune'],"['Carcassonne Big Box', 'Carcassonne Big Box 2...",,,,,,,,,,,,,,carcassonne tile placement game player draw pl...


### TF-IDF matrix for board game description

In [None]:
# define TF-IDF Vectorizer, remove all english stop words, select ngram 1 & 2
tfidf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),stop_words='english')

# Fit and transform the data in order to get the matrix
description = tfidf.fit_transform(bg_con['description clean'])
tfidf_feature_names = tfidf.get_feature_names()

# Print out some features name and matrix shape
print(f'Example feature: {tfidf_feature_names[0:5]}')
print(f'Matrix shape: {description.shape}')

Example feature: ['aa', 'aa anti', 'aa battery', 'aa combat', 'aa good']
Matrix shape: (19230, 995063)


In [None]:
recommendations('Pandemic', description)

*********************************************************************
Top 5 Similar Boardgames with Pandemic: 
 


Unnamed: 0,name,url
29,Pandemic Legacy: Season 1,/boardgame/161936/pandemic-legacy-season-1
333,Pandemic: The Cure,/boardgame/150658/pandemic-cure
999,Pandemic: Contagion,/boardgame/157789/pandemic-contagion
6396,Side Effects,/boardgame/230765/side-effects
6556,Pandemic: Hot Zone – North America,/boardgame/301919/pandemic-hot-zone-north-america


## Recommendation based on boardgame category and boardgame mechanic

In [None]:
# Select interested features column
features_col = ['name','boardgamecategory','boardgamemechanic']
bg_f = bg_con[features_col]
bg_f.head()        

Unnamed: 0,name,boardgamecategory,boardgamemechanic
0,Pandemic,['Medical'],"['Action Points', 'Cooperative Game', 'Hand Ma..."
1,Carcassonne,"['City Building', 'Medieval', 'Territory Build...","['Area Majority / Influence', 'Map Addition', ..."
2,Catan,"['Economic', 'Negotiation']","['Dice Rolling', 'Hexagon Grid', 'Income', 'Mo..."
3,7 Wonders,"['Ancient', 'Card Game', 'City Building', 'Civ...","['Card Drafting', 'Drafting', 'Hand Management..."
4,Dominion,"['Card Game', 'Medieval']","['Deck, Bag, and Pool Building', 'Delayed Purc..."


In [None]:
# Fill na with empty string if any
bg_f['boardgamecategory'] = bg_f['boardgamecategory'].fillna(' ')
bg_f['boardgamemechanic'] = bg_f['boardgamemechanic'].fillna(' ')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [None]:
# Cleaning of words
# In order to not lose the meaning of the words, combine two words as one

# Data not in a list but a string, so using a string way to clean
cleaned = []
for words in bg_f['boardgamecategory']:
  comma = words.replace(",", "comma")
  text = re.sub(r'[^\w\s]','',comma)
  joinword = text.replace(' ','')
  cleaned_word = joinword.replace('comma',' ')
  cleaned.append(cleaned_word.lower())
  
bg_f['boardgamecategory'] = cleaned

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':


In [None]:
# Cleaning of words

cleaned2 = []
for words in bg_f['boardgamemechanic']:
  comma = words.replace(",", "comma")
  text = re.sub(r'[^\w\s]','',comma)
  joinword = text.replace(' ','')
  cleaned_word = joinword.replace('comma',' ')
  cleaned2.append(cleaned_word.lower())

bg_f['boardgamemechanic'] = cleaned2

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


In [None]:
bg_f.head()

Unnamed: 0,name,boardgamecategory,boardgamemechanic
0,Pandemic,medical,actionpoints cooperativegame handmanagement po...
1,Carcassonne,citybuilding medieval territorybuilding,areamajorityinfluence mapaddition tileplacement
2,Catan,economic negotiation,dicerolling hexagongrid income modularboard ne...
3,7 Wonders,ancient cardgame citybuilding civilization eco...,carddrafting drafting handmanagement setcollec...
4,Dominion,cardgame medieval,deck bag andpoolbuilding delayedpurchase handm...


In [None]:
bg_f['words'] = bg_f[['boardgamecategory', 'boardgamemechanic']].agg(' '.join, axis=1)
bg_f.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,name,boardgamecategory,boardgamemechanic,words
0,Pandemic,medical,actionpoints cooperativegame handmanagement po...,medical actionpoints cooperativegame handmanag...
1,Carcassonne,citybuilding medieval territorybuilding,areamajorityinfluence mapaddition tileplacement,citybuilding medieval territorybuilding areama...
2,Catan,economic negotiation,dicerolling hexagongrid income modularboard ne...,economic negotiation dicerolling hexagongrid i...
3,7 Wonders,ancient cardgame citybuilding civilization eco...,carddrafting drafting handmanagement setcollec...,ancient cardgame citybuilding civilization eco...
4,Dominion,cardgame medieval,deck bag andpoolbuilding delayedpurchase handm...,cardgame medieval deck bag andpoolbuilding del...


### TF-IDF matrix for bag of words (board game category & board game mechanic)

In [None]:
# define TF-IDF Vectorizer, remove all english stop words, select ngram 1 & 2
tfidf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),stop_words='english')

# Fit and transform the data in order to get the matrix
category_mechanic = tfidf.fit_transform(bg_f['words'])
tfidf_feature_names = tfidf.get_feature_names()

# Print out some features name and matrix shape
print(f'Example feature: {tfidf_feature_names[0:5]}')
print(f'Matrix shape: {category_mechanic.shape}')

Example feature: ['abstractstrategy', 'abstractstrategy actiondexterity', 'abstractstrategy actionpoints', 'abstractstrategy actionqueue', 'abstractstrategy adventure']
Matrix shape: (19230, 6760)


In [None]:
recommendations('Pandemic', category_mechanic)

*********************************************************************
Top 5 Similar Boardgames with Pandemic: 
 


Unnamed: 0,name,url
6556,Pandemic: Hot Zone – North America,/boardgame/301919/pandemic-hot-zone-north-america
399,Pandemic: Iberia,/boardgame/198928/pandemic-iberia
29,Pandemic Legacy: Season 1,/boardgame/161936/pandemic-legacy-season-1
437,Pandemic: Reign of Cthulhu,/boardgame/192153/pandemic-reign-cthulhu
572,Keltis,/boardgame/34585/keltis


## Recommendation based on boardgame designer, artist, and publisher

In [None]:
features_col = ['name','boardgamedesigner','boardgameartist','boardgamepublisher']
bg_f2 = bg_con[features_col]
bg_f2.head()    

Unnamed: 0,name,boardgamedesigner,boardgameartist,boardgamepublisher
0,Pandemic,['Matt Leacock'],"['Josh Cappel', 'Christian Hanisch', 'Régis Mo...","['Z-Man Games, Inc.', '(Unknown)', 'Albi', 'As..."
1,Carcassonne,['Klaus-Jürgen Wrede'],"['Doris Matthäus', 'Anne Pätzke', 'Chris Quill...","['Hans im Glück', '999 Games', 'Albi', 'Bard C..."
2,Catan,['Klaus Teuber'],"['Volkan Baga', 'Tanja Donner', 'Pete Fenlon',...","['KOSMOS', '999 Games', 'Albi', 'Astrel Games'..."
3,7 Wonders,['Antoine Bauza'],"['Antoine Bauza', 'Miguel Coimbra']","['Repos Production', 'ADC Blackfire Entertainm..."
4,Dominion,['Donald X. Vaccarino'],"['Matthias Catrein', 'Julien Delval', 'Tomasz ...","['Rio Grande Games', '999 Games', 'Albi', 'Bar..."


In [None]:
# Fill na with empty string if any
bg_f2['boardgamedesigner'] = bg_f2['boardgamedesigner'].fillna(' ')
bg_f2['boardgameartist'] = bg_f2['boardgameartist'].fillna(' ')
bg_f2['boardgamepublisher'] = bg_f2['boardgamepublisher'].fillna(' ')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


In [None]:
# Cleaning of words

cleanedd = []
for words in bg_f2['boardgamedesigner']:
  comma = words.replace(",", "comma")
  text = re.sub(r'[^\w\s]','',comma)
  joinword = text.replace(' ','')
  cleaned_word = joinword.replace('comma',' ')
  cleanedd.append(cleaned_word.lower())
  
bg_f2['boardgamedesigner'] = cleanedd

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


In [None]:
# Cleaning of words

cleaneda = []
for words in bg_f2['boardgameartist']:
  comma = words.replace(",", "comma")
  text = re.sub(r'[^\w\s]','',comma)
  joinword = text.replace(' ','')
  cleaned_word = joinword.replace('comma',' ')
  cleaneda.append(cleaned_word.lower())
  
bg_f2['boardgameartist'] = cleaneda

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


In [None]:
# Cleaning of words

cleanedp = []
for words in bg_f2['boardgamepublisher']:
  comma = words.replace(",", "comma")
  text = re.sub(r'[^\w\s]','',comma)
  joinword = text.replace(' ','')
  cleaned_word = joinword.replace('comma',' ')
  cleanedp.append(cleaned_word.lower())
  
bg_f2['boardgamepublisher'] = cleanedp

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


In [None]:
bg_f2.head()

Unnamed: 0,name,boardgamedesigner,boardgameartist,boardgamepublisher
0,Pandemic,mattleacock,joshcappel christianhanisch régismoulun chrisq...,zmangames inc unknown albi asmodee asmodeeital...
1,Carcassonne,klausjürgenwrede,dorismatthäus annepätzke chrisquilliams klausj...,hansimglück 999games albi bardcentrumgier berg...
2,Catan,klausteuber,volkanbaga tanjadonner petefenlon jasonhawkins...,kosmos 999games albi astrelgames bergsalaenigm...
3,7 Wonders,antoinebauza,antoinebauza miguelcoimbra,reposproduction adcblackfireentertainment asmo...
4,Dominion,donaldxvaccarino,matthiascatrein juliendelval tomaszjedruszek r...,riograndegames 999games albi bardcentrumgier c...


In [None]:
bg_f2['words'] = bg_f2[['boardgamedesigner', 'boardgameartist','boardgamepublisher']].agg(' '.join, axis=1)
bg_f2.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,name,boardgamedesigner,boardgameartist,boardgamepublisher,words
0,Pandemic,mattleacock,joshcappel christianhanisch régismoulun chrisq...,zmangames inc unknown albi asmodee asmodeeital...,mattleacock joshcappel christianhanisch régism...
1,Carcassonne,klausjürgenwrede,dorismatthäus annepätzke chrisquilliams klausj...,hansimglück 999games albi bardcentrumgier berg...,klausjürgenwrede dorismatthäus annepätzke chri...
2,Catan,klausteuber,volkanbaga tanjadonner petefenlon jasonhawkins...,kosmos 999games albi astrelgames bergsalaenigm...,klausteuber volkanbaga tanjadonner petefenlon ...
3,7 Wonders,antoinebauza,antoinebauza miguelcoimbra,reposproduction adcblackfireentertainment asmo...,antoinebauza antoinebauza miguelcoimbra reposp...
4,Dominion,donaldxvaccarino,matthiascatrein juliendelval tomaszjedruszek r...,riograndegames 999games albi bardcentrumgier c...,donaldxvaccarino matthiascatrein juliendelval ...


### TF-IDF matrix for bag of words (board game artist, designer, and publisher)

In [None]:
# define TF-IDF Vectorizer, remove all english stop words, select ngram 1 & 2
tfidf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),stop_words='english')

# Fit and transform the data in order to get the matrix
artist_designer_publisher = tfidf.fit_transform(bg_f2['words'])
tfidf_feature_names = tfidf.get_feature_names()

# Print out some features name and matrix shape
print(f'Example feature: {tfidf_feature_names[0:5]}')
print(f'Matrix shape: {artist_designer_publisher.shape}')

Example feature: ['0hrarttechnology', '101codinganddesign', '101codinganddesign editionspielwiese', '123gameséditions', '123gameséditions actiongt']
Matrix shape: (19230, 80301)


In [None]:
recommendations('Pandemic', artist_designer_publisher)

*********************************************************************
Top 5 Similar Boardgames with Pandemic: 
 


Unnamed: 0,name,url
29,Pandemic Legacy: Season 1,/boardgame/161936/pandemic-legacy-season-1
3,7 Wonders,/boardgame/68448/7-wonders
13,Splendor,/boardgame/148228/splendor
18,Dixit,/boardgame/39856/dixit
285,Pandemic Legacy: Season 2,/boardgame/221107/pandemic-legacy-season-2


## General Recommendation

Recommend similar games based on the general informations including almost all the features of the game

In [None]:
bg_gen = bg_con[['name','description clean']]
bg_gen.head(2)

Unnamed: 0,name,description clean
0,Pandemic,pandemic several virulent disease broken simul...
1,Carcassonne,carcassonne tile placement game player draw pl...


In [None]:
bg_gen['words cm'] = bg_f['words']
bg_gen['words dap'] = bg_f2['words']
bg_gen.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,name,description clean,words cm,words dap
0,Pandemic,pandemic several virulent disease broken simul...,medical actionpoints cooperativegame handmanag...,mattleacock joshcappel christianhanisch régism...
1,Carcassonne,carcassonne tile placement game player draw pl...,citybuilding medieval territorybuilding areama...,klausjürgenwrede dorismatthäus annepätzke chri...
2,Catan,catan formerly settler catan player try domina...,economic negotiation dicerolling hexagongrid i...,klausteuber volkanbaga tanjadonner petefenlon ...
3,7 Wonders,leader one great city ancient world gather res...,ancient cardgame citybuilding civilization eco...,antoinebauza antoinebauza miguelcoimbra reposp...
4,Dominion,monarch like parent ruler small pleasant kingd...,cardgame medieval deck bag andpoolbuilding del...,donaldxvaccarino matthiascatrein juliendelval ...


In [None]:
bg_gen['words'] = bg_gen[['description clean', 'words cm','words dap']].agg(' '.join, axis=1)
bg_gen.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,name,description clean,words cm,words dap,words
0,Pandemic,pandemic several virulent disease broken simul...,medical actionpoints cooperativegame handmanag...,mattleacock joshcappel christianhanisch régism...,pandemic several virulent disease broken simul...
1,Carcassonne,carcassonne tile placement game player draw pl...,citybuilding medieval territorybuilding areama...,klausjürgenwrede dorismatthäus annepätzke chri...,carcassonne tile placement game player draw pl...
2,Catan,catan formerly settler catan player try domina...,economic negotiation dicerolling hexagongrid i...,klausteuber volkanbaga tanjadonner petefenlon ...,catan formerly settler catan player try domina...
3,7 Wonders,leader one great city ancient world gather res...,ancient cardgame citybuilding civilization eco...,antoinebauza antoinebauza miguelcoimbra reposp...,leader one great city ancient world gather res...
4,Dominion,monarch like parent ruler small pleasant kingd...,cardgame medieval deck bag andpoolbuilding del...,donaldxvaccarino matthiascatrein juliendelval ...,monarch like parent ruler small pleasant kingd...


### TF-IDF matrix for bag of words

In [None]:
# define TF-IDF Vectorizer, remove all english stop words, select ngram 1 & 2
tfidf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),stop_words='english')

# Fit and transform the data in order to get the matrix
boardgames_info = tfidf.fit_transform(bg_gen['words'])
tfidf_feature_names = tfidf.get_feature_names()

# Print out some features name and matrix shape
print(f'Example feature: {tfidf_feature_names[0:5]}')
print(f'Matrix shape: {boardgames_info.shape}')

Example feature: ['0hrarttechnology', '101codinganddesign', '101codinganddesign editionspielwiese', '123gameséditions', '123gameséditions actiongt']
Matrix shape: (19230, 1103516)


In [None]:
recommendations('Pandemic', boardgames_info)

*********************************************************************
Top 5 Similar Boardgames with Pandemic: 
 


Unnamed: 0,name,url
29,Pandemic Legacy: Season 1,/boardgame/161936/pandemic-legacy-season-1
333,Pandemic: The Cure,/boardgame/150658/pandemic-cure
999,Pandemic: Contagion,/boardgame/157789/pandemic-contagion
6556,Pandemic: Hot Zone – North America,/boardgame/301919/pandemic-hot-zone-north-america
6396,Side Effects,/boardgame/230765/side-effects



## Limitation & Recommendation

#### Deploy the models 
- Due to time constraint, the next step for this project will be deploying the models and building the recommendation system in the website for users

#### User profile data for collabrative filtering 
- In order to build a more customised recommendation system, the team should put in effort to collect more user ratings from different games and from there we can easily get the user profile and build a recommendation system that suits each user based on their preferences.

#### Board game features improvements
- Based on the topic modelling, we have some findings that can be provided to board game publishers for further improvements on the board game thus potentially increasing user base and sales. 