![board game](board_game.jpg)

## Executive Summary

In the realm of board games, every roll of the dice unfolds a story of strategy, excitement, and camaraderie. Our journey through the vast landscape of board games begins with a dataset encompassing over 20,000 ranked board games from BoardGameGeek (BGG). Through rigorous analysis, I've unearthed valuable insights and crafted a content-based game recommender system to guide your next game night.

The odyssey commences with a keen eye on anomalies in critical features, prompting meticulous data cleaning. From eliminating rows with zero popularity to trimming extreme values, each step ensures the dataset's integrity. A new feature, 'subjective_popularity,' becomes the North Star, blending user ratings, BGG rank, average rating, and rating ratio with precision weights.

Navigating through temporal shifts, I've excluded games from the distant past and the uncertain present to refine our recommendations. The exploration extends to age requirements, players' count, and playtime, aligning the dataset with a discerning eye for quality recommendations.

Imputation strategies, from zero-player adjustments to predicting missing 'Owned Users' values, add finesse to our dataset. The correlation dance among playtime, age, complexity, and ratings reveals the subtle nuances of board game dynamics.

The tale unfolds with intriguing observations. A paradoxical trend emerges where user engagement declines, but game ratings soar after 2015, hinting at a shift in game creation motives. The impact of the number of players on average ratings remains negligible, and games tailored for adolescents shine as unexpected gems.

In conclusion, our content-based recommender system, fueled by the robust cosine similarity, stands ready to curate your next gaming adventure. From the classics to hidden gems, let data be your guide in the enchanting world of board games. Roll the dice, make your move, and let the games begin!

# Unraveling the Board Game Enigma: Data-Driven Recommendations

In the world of board games, there's a game for every occasion and a playstyle for every group. As board game enthusiasts, we've all faced the delightful yet perplexing dilemma of choosing the perfect game to play on game night. That's where data analysis becomes the key to unlocking the mysteries of board game popularity and player preferences. In this submission, I present an in-depth exploration of a comprehensive dataset containing information on over 20,000 ranked board games from BoardGameGeek (BGG). The dataset, meticulously assembled in February 2021, unveils the diverse landscape of board games, from classic titles to modern gems.

As a data analyst, I've dived headfirst into this treasure trove of board game statistics, revealing fascinating patterns and insights that can guide our game night decisions. I've also delved into the intriguing relationship between game attributes, such as player count, playtime, and complexity, and how they influence a game's popularity. With this analysis, I aim to provide you with data-driven game recommendations and a deeper understanding of what makes a board game a hit among players.

So, whether you're a board game aficionado seeking fresh insights or a casual gamer searching for the ideal game for your next gathering, this submission is your guide to data-driven board game adventures. 

Let's embark on this journey of discovery and make every game night an unforgettable experience!


In [21]:
# First, I import some libraries and modules for data manipulation, visualization, and machine learning.
import pandas as pd
import numpy as np
import missingno as msno
from sklearn.preprocessing import OneHotEncoder,RobustScaler,MinMaxScaler
import plotly.express as px
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import plotly.graph_objects as go
# I set some options to display the data in a better way.
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.float_format', lambda x: '%.2f' % x)

# I load the data from a CSV file into a pandas dataframe. The data contains various information about the board games, such as the name, the year published, the number of players, the play time, the minimum age, the user ratings, the BGG rank, the complexity, the mechanics, the domains, and the number of owners.
boardgame = pd.read_csv('data/bgg_data.csv',dtype='O')
boardgame

Unnamed: 0,ID,Name,Year Published,Min Players,Max Players,Play Time,Min Age,Users Rated,Rating Average,BGG Rank,Complexity Average,Owned Users,Mechanics,Domains
0,174430,Gloomhaven,2017,1,4,120,14,42055,8.79,1,3.86,68323,"Action Queue, Action Retrieval, Campaign / Bat...","Strategy Games, Thematic Games"
1,161936,Pandemic Legacy: Season 1,2015,2,4,60,13,41643,8.61,2,2.84,65294,"Action Points, Cooperative Game, Hand Manageme...","Strategy Games, Thematic Games"
2,224517,Brass: Birmingham,2018,2,4,120,14,19217,8.66,3,3.91,28785,"Hand Management, Income, Loans, Market, Networ...",Strategy Games
3,167791,Terraforming Mars,2016,1,5,120,12,64864,8.43,4,3.24,87099,"Card Drafting, Drafting, End Game Bonuses, Han...",Strategy Games
4,233078,Twilight Imperium: Fourth Edition,2017,3,6,480,14,13468,8.7,5,4.22,16831,"Action Drafting, Area Majority / Influence, Ar...","Strategy Games, Thematic Games"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20338,16398,War,0,2,2,30,4,1340,2.28,20340,0.01,427,,Children's Games
20339,7316,Bingo,1530,2,99,60,5,2154,2.85,20341,1.05,1533,"Betting and Bluffing, Bingo, Pattern Recognition",Party Games
20340,5048,Candy Land,1949,2,4,30,3,4006,3.18,20342,1.08,5788,Roll / Spin and Move,Children's Games
20341,5432,Chutes and Ladders,-200,2,6,30,3,3783,2.86,20343,1.02,4400,"Dice Rolling, Grid Movement, Race, Roll / Spin...",Children's Games


In [22]:
boardgame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20343 entries, 0 to 20342
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   ID                  20327 non-null  object
 1   Name                20343 non-null  object
 2   Year Published      20342 non-null  object
 3   Min Players         20343 non-null  object
 4   Max Players         20343 non-null  object
 5   Play Time           20343 non-null  object
 6   Min Age             20343 non-null  object
 7   Users Rated         20343 non-null  object
 8   Rating Average      20343 non-null  object
 9   BGG Rank            20343 non-null  object
 10  Complexity Average  20343 non-null  object
 11  Owned Users         20320 non-null  object
 12  Mechanics           18745 non-null  object
 13  Domains             10184 non-null  object
dtypes: object(14)
memory usage: 2.2+ MB


In [23]:
# I identify the categorical and numerical columns in the data, as they will be used for different purposes.
cat_cols = ['Name','Mechanics','Domains']
num_cols = ['Year Published','Min Players','Max Players','Play Time','Min Age','Users Rated','BGG Rank','Owned Users','Complexity Average','Rating Average']

In [24]:
# I convert some of the columns to the appropriate data types, such as float, to avoid errors and inconsistencies.
boardgame[num_cols] = boardgame[num_cols].astype(float)

# I use some descriptive statistics to understand the distribution and trends of the numerical features.
boardgame.describe(percentiles=[0.01,0.05,0.15,0.25,0.5,0.75,0.95,0.99]).T
# I notice that there are some anomalies in the features like year_published, min_players, max_players, play_time, and min_age.

Unnamed: 0,count,mean,std,min,1%,5%,15%,25%,50%,75%,95%,99%,max
Year Published,20342.0,1984.25,214.0,-3500.0,859.58,1975.0,1992.0,2001.0,2011.0,2016.0,2019.0,2020.0,2022.0
Min Players,20343.0,2.02,0.69,0.0,1.0,1.0,1.0,2.0,2.0,2.0,3.0,4.0,10.0
Max Players,20343.0,5.67,15.23,0.0,1.0,2.0,2.0,4.0,4.0,6.0,10.0,30.58,999.0
Play Time,20343.0,91.29,545.45,0.0,0.0,10.0,20.0,30.0,45.0,90.0,240.0,600.0,60000.0
Min Age,20343.0,9.6,3.65,0.0,0.0,0.0,7.0,8.0,10.0,12.0,14.0,17.0,25.0
Users Rated,20343.0,840.97,3511.56,30.0,30.0,33.0,43.0,55.0,120.0,385.0,3158.9,14433.7,102214.0
Rating Average,20343.0,6.4,0.96,0.03,3.91,4.8,5.48,5.81,6.43,7.03,7.88,8.49,9.58
BGG Rank,20343.0,10172.89,5872.83,1.0,204.42,1018.1,3053.3,5087.5,10173.0,15258.5,19326.9,20140.58,20344.0
Complexity Average,20343.0,1.64,1.14,0.0,0.0,0.01,0.02,1.08,1.67,2.48,3.5,4.18,4.93
Owned Users,20320.0,1408.46,5040.18,0.0,37.19,65.0,105.0,146.0,309.0,864.0,5240.4,20826.18,155312.0


In [25]:
#there is only one record which value is zero in Owned Users. so we can discard it.
boardgame = boardgame[boardgame['Owned Users'] !=0]

# I create a new feature called 'subjective_popularity' to measure the popularity of the board games based on a combination of users rated, BGG rank, average rating, and rating ratio. I use the MinMaxScaler to scale the features between 1 and 5, and I use the log transformation to reduce the skewness. I assign different weights to the features based on their importance.
def calculate_popularity():
    users_rated = MinMaxScaler(feature_range=[1,5]).fit_transform(np.log(boardgame['Users Rated'].values).reshape(-1,1)).flatten()
    bbg_rank = MinMaxScaler(feature_range=[-5,-1]).fit_transform(boardgame['BGG Rank'].values.reshape(-1,1)).flatten()*-1
    avg_rating = MinMaxScaler(feature_range=[1,5]).fit_transform(boardgame['Rating Average'].values.reshape(-1,1)).flatten()
    rating_rat = MinMaxScaler(feature_range=[1,5]).fit_transform(np.log((boardgame['Users Rated'] / boardgame['Owned Users']).values).reshape(-1,1)).flatten()

    boardgame['subjective_popularity'] = users_rated * 0.4 + bbg_rank * 0.3 + avg_rating * 0.2 + rating_rat * 0.1

calculate_popularity()

# I use the quantile function to find a reasonable threshold for defining "popular" games. Based on the percentiles, I choose the 80th percentile, which is 3.25.
boardgame['subjective_popularity'].quantile(0.80) # 3.25



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



3.2449004729146287

# Analysis of the features that have anomalies apparently

## Year Published

In [26]:
# I plot the cumulative users rated number by published year. I see that there is a spark point around 1995, where the number of users rated increases significantly.
fig1 = px.line(boardgame.groupby(["Year Published"])["Users Rated"].sum().cummax(),range_x=[1900,2023],title="Cumulative Users Rated number by published year")

point_to_mark = {
    'Year Published': 1995,
    'value': 227242
}
fig1.add_trace(go.Scatter(x=[point_to_mark['Year Published']], y=[point_to_mark['value']], mode='markers+text',
                         marker=dict(size=10, color='red'), text='Spark Point',textposition='top center'))
fig1.show()

# I decide to exclude the games that are published before 1995, because they are not popular enough and I don't want to recommend them.

# I plot the users rated numbers by published years. I see that the games published after 2020 have very low numbers of users rated, which might indicate that the data was collected in early 2021.
fig2= px.bar(boardgame.groupby(["Year Published"])["Users Rated"].sum(),range_x=[1995,2023],title= "Users Rated numbers by published years" )

fig2.show()

# I decide to exclude the games published after 2020, because they are not reliable enough and I don't want to recommend them.

# I filter the data to keep only the games published between 1995 and 2020.
boardgame = boardgame[(boardgame["Year Published"]>=1995) & (boardgame["Year Published"]<=2020)]


## Min Players

In [27]:
# I plot the subjective popularity of the games with zero minimum players. I see that there are few rows and the popularity of these games seems low. So I decide to eliminate these rows except the popular ones.
px.scatter(boardgame[boardgame["Min Players"]==0]['subjective_popularity'].reset_index(drop=True),color=boardgame[boardgame["Min Players"]==0]['subjective_popularity']>3.25,title='Subjective Popularities of min players=0').show()

boardgame = boardgame[~((boardgame["Min Players"]==0) & (boardgame['subjective_popularity'] < 3.25))]

# I impute the zero values in the 'Min Players' feature with one, because every game must have at least one player.
boardgame.loc[boardgame["Min Players"]==0,"Min Players"] =1

## Max Players

In [28]:
# I plot the subjective popularity of the games with zero maximum players. I see that there are few rows and the popularity of these games seems low. So I decide to eliminate these rows except the popular ones.
px.scatter(boardgame[boardgame["Max Players"]==0]['subjective_popularity'].reset_index(drop=True),color=boardgame[boardgame["Max Players"]==0]['subjective_popularity']>3.25,title='Subjective Popularities of max players=0').show()

boardgame = boardgame[~((boardgame["Max Players"]==0) & (boardgame['subjective_popularity'] < 3.25))]

# I impute the zero values in the 'Max Players' feature with the corresponding values in the 'Min Players' feature, because the maximum number of players should be greater than or equal to the minimum number of players.
boardgame.loc[boardgame["Max Players"]==0,"Max Players"] = boardgame.loc[boardgame["Max Players"]==0,"Min Players"]

# I see that there are some extreme values in the 'Max Players' feature compared to the majority. So I decide to trim these values by using the 99th percentile as a cutoff point.
boardgame.loc[(boardgame["Max Players"]>36),'Max Players']= 36 # 36 is the 99th percentile of the max players feature.

## Play Time

In [29]:
boardgame[boardgame["Play Time"]==0].describe(percentiles=[0.01,0.05,0.15,0.25,0.5,0.75,0.95,0.99]).T
# After analyzing, i found that there were only a few rows with playtime equal to 0, and these games seemed to have lower popularity.I decided to remove these rows to clean up our dataset.

boardgame = boardgame[boardgame["Play Time"]!=0]

# Next, i looked at games with playtime less than 10 minutes and subjective popularity less than 3.25. I used 10 minutes as our threshold, which represents the 1st percentile of the playtime feature. Games falling into this category were filtered out to refine our dataset.

boardgame = boardgame[~((boardgame["Play Time"]<10) & (boardgame['subjective_popularity']<3.25))] 


# I also examined games with playtime greater than 540 minutes and subjective popularity less than 3.25. 540 minutes was chosen as the threshold, representing the 99th percentile of the playtime feature. Again, games meeting these criteria were removed to improve our dataset.

boardgame = boardgame[~((boardgame["Play Time"]>540) & (boardgame['subjective_popularity'] < 3.25))] 


# To ensure data consistency, I adjusted playtime values below 10 minutes to be exactly 10 minutes.Similarly, we capped playtime values above 540 minutes at 540 minutes.

boardgame.loc[boardgame["Play Time"]<10,'Play Time'] = 10
boardgame.loc[boardgame["Play Time"]>540,'Play Time'] = 540

Min Age

In [30]:
# I've calculated the percentiles for the 'Min Age' feature and found that 17 is the 99th percentile, while 0 represents the 1st percentile.

boardgame[boardgame['Min Age']>17].describe(percentiles=[0.01,0.05,0.15,0.25,0.5,0.75,0.95,0.99]).T
boardgame[boardgame['Min Age']==0].describe(percentiles=[0.01,0.05,0.15,0.25,0.5,0.75,0.95,0.99]).T

# When I explore games with a minimum age requirement greater than 17, I see some statistics.Similarly, I examine games with a minimum age requirement of 0 and gather relevant statistics.

# I decide that I want to clean the dataset by eliminating rows with unpopular games, specifically those with a minimum age requirement equal to zero or greater than 17. To do this, I filter out these rows based on these criteria.

boardgame = boardgame[~(((boardgame['Min Age']>17) | (boardgame['Min Age']==0)) & (boardgame['subjective_popularity']<3.25))]


# To maintain data consistency, I set the minimum age for games with a requirement of 0 to 1. This ensures that all games in the dataset have a valid minimum age requirement.

boardgame.loc[boardgame['Min Age']==0,"Min Age"] = 1


# Dealing with Missing Values

In [31]:
# First, we'll use our detective skills to identify the locations of missing values.
print(boardgame.isnull().sum())  # A quick scan reveals missing values in 'Owned Users' for numeric columns.


cols_with_missed_values = boardgame.isnull().sum()[boardgame.isnull().sum() > 0].index

# Now, let's dive deeper into the scenes where these values are missing. Each column with missing values has its own story to tell.
for col in cols_with_missed_values:
    print(col + '\n')
    print(boardgame[boardgame[col].isnull()].describe().T)
    print('\n\n')

# As we strive to create the ultimate recommender system, we must make some  decisions. 

# We can discard the rows with missing values that appear insignificant in relation to the counts of owned users and users who have provided ratings.


# Let's start by eliminating the rows with null IDs. These mysterious entities seem to be unpopular and niche games, not worthy of our recommendation spotlight.
boardgame = boardgame[~boardgame['ID'].isnull()]


# we can predict missing owned users values by linear regression(there is a strong correlation between users rated and owned users columns)
lr_user_data = boardgame[~boardgame['Owned Users'].isnull()][['Users Rated','Owned Users']]
owned_users_pred = LinearRegression().fit(lr_user_data['Users Rated'].values.reshape(-1,1),lr_user_data['Owned Users'].values.reshape(-1,1)).predict(boardgame[boardgame['Owned Users'].isnull()]['Users Rated'].values.reshape(-1,1))
boardgame.loc[boardgame['Owned Users'].isnull(),'Owned Users'] = owned_users_pred.flatten()

#now we have 4 null row in subjective popularity. so we can recalculate the popularity calculation.
calculate_popularity()

# we have eliminated numeric missing values , now we can convert our object typed columns to numeric types
boardgame[num_cols[:-2]] = boardgame[num_cols[:-2]].astype(int)
boardgame[num_cols[-2:]] = boardgame[num_cols[-2:]].astype(float)



ID                         11
Name                        0
Year Published              0
Min Players                 0
Max Players                 0
Play Time                   0
Min Age                     0
Users Rated                 0
Rating Average              0
BGG Rank                    0
Complexity Average          0
Owned Users                15
Mechanics                1042
Domains                  7949
subjective_popularity      15
dtype: int64
ID

                       count     mean     std      min      25%      50%      75%      max
Year Published         11.00  2012.00    6.07  1999.00  2009.50  2014.00  2015.50  2020.00
Min Players            11.00     2.09    0.54     1.00     2.00     2.00     2.00     3.00
Max Players            11.00     5.18    1.60     2.00     4.00     6.00     6.00     8.00
Play Time              11.00    52.27   39.14    15.00    25.00    40.00    67.50   120.00
Min Age                11.00    11.27    2.72     8.00    10.00    12.00    12

# Correlations ?

In [32]:
corr_df = boardgame[num_cols].corr()[((boardgame[num_cols].corr()> 0.30) | (boardgame[num_cols].corr()< -0.30)) & ((boardgame[num_cols].corr() !=1))] #to point out the important correlation values

px.imshow(corr_df).show()

############# Some insights from correlation df ################
# 1) There is a positive medium level correlation between play time and min age . 
# 2) There is a positive medium level correlation between rating average and (play time and min age) . 
# 3) There is a positive medim level correlation between complexity average and (play time, min age and rating average)
# 4) There is a negative strong level correlation between BGG rank and rating average (as expected!)
# 5) There is a positive strong level correlation between owned users and users rated.(as expected!)

# Data Visualization

In [33]:
for col in num_cols:
    if col != 'Year Published':
        px.scatter(boardgame[[col,'Year Published']].groupby(['Year Published']).median().reset_index(),y=col,x='Year Published').show()

#in these graphs , the thing that caught my attention the most is the users rated or owned users number has been decreasing constantly since 2015 published year, but rating average or subjective popularity value has been increasing. Maybe this means the game creators have created games since 2015 not for the general audience, but for less but more passionate special audience 



#Impact of Number of Players on Average Rating

boardgame['expected_player_num'] = ((boardgame['Min Players']+boardgame['Max Players'])/2).round()  #lets create a 'expected player numbers' feature by calculating the mean of the min and max player numbers.
num_cols.append("expected_player_num")

fig = px.violin(boardgame, x='expected_player_num', y='Rating Average', color='expected_player_num', title = 'Average Rating by Expected Number of Players')
fig.show()

# This plot demonstrates that there is no significant difference in the average rating with respect to the expected number of players

# Need new categorical columns !

In [34]:
# creating some new cat cols

boardgame.loc[boardgame['Min Age']<8,'age_profil'] = 'Toddlers'
boardgame.loc[(boardgame['Min Age']<12) & (boardgame['age_profil'].isnull()),'age_profil'] = 'Kids'
boardgame.loc[(boardgame['Min Age']<16) & (boardgame['age_profil'].isnull()),'age_profil'] = 'Adolescents'
boardgame.loc[(boardgame['Min Age']<30) & (boardgame['age_profil'].isnull()),'age_profil'] = 'Youth/Adults'


boardgame['Popularity_Class']=pd.qcut(boardgame['subjective_popularity'],q=5,labels=['lowest popularity','low popularity','moderate popularity','high popularity','highest popularity'])

boardgame["Complexity_Class"]=pd.cut(boardgame["Complexity Average"],bins=[-1,1,2,3,4],labels=["lowest complexity","low complexity","moderate complex","highest complex"])

comp_class_cols =pd.get_dummies(boardgame["Complexity_Class"],prefix='COMP_',dtype=int).columns

boardgame[comp_class_cols] = pd.get_dummies(boardgame["Complexity_Class"],prefix='COMP_',dtype=int)

fig = px.bar(boardgame.groupby(["age_profil"]).agg({"subjective_popularity":"mean"}))
fig.show()
#according to this graph we can say the games for adolescents (12-16 age) are more popular compare to the other groups.

# Let's transform our columns with natural language into a digital format.

In [35]:

boardgame.loc[boardgame['Mechanics'].isnull(),"Mechanics"] = 'unknown'
boardgame.loc[boardgame['Domains'].isnull(),'Domains'] = 'unknown'

condition1= (boardgame['Mechanics']=="unknown") & (boardgame['Domains']=="unknown") & (boardgame['Popularity_Class'].isin(["lowest popularity","low popularity","moderate popularity"]))

boardgame = boardgame[~condition1]


def word2numerical(df,cat_col,trim =True):

    temp_data = df[cat_col].str.split(',',expand=True)
    temp_data=temp_data.applymap(lambda x : str(x).strip())
    temp_data =  temp_data.fillna(np.nan).applymap(lambda x : np.nan if (str(x) in ['None','nan']) else x)
    result_rows = []

    for _, row in temp_data.iterrows():
        result_row = {}
        for col in row.dropna():
            result_row[col] = 1
        result_rows.append(result_row)

    temp_data_edited = pd.DataFrame(result_rows).fillna(0).astype(int)

    if trim:
        important_features = pd.DataFrame(temp_data_edited.sum().sort_values(ascending=False)).reset_index()

        important_features  =important_features[important_features[0]>important_features[0].quantile(0.95)]['index']

        temp_data_edited_imp = temp_data_edited[important_features]
        
        temp_data_edited_imp.columns  = [name+'_'+cat_col[0:3].upper() for name in temp_data_edited_imp.columns]

        return temp_data_edited_imp

    else:
        temp_data_edited.columns  = [name+'_'+cat_col[0:3].upper() for name in temp_data_edited.columns]
        return temp_data_edited
boardgame = pd.concat([boardgame.reset_index(drop=True),word2numerical(boardgame,'Mechanics',False).reset_index(drop=True)],axis=1)

boardgame = pd.concat([boardgame.reset_index(drop=True),word2numerical(boardgame,'Domains',False).reset_index(drop=True)],axis=1)

content_cols = [col_name for col_name in boardgame if ('_MEC' in col_name) or ('_DOM' in col_name)] 

similarity_cols = content_cols+ list(comp_class_cols)


# Ready for a cosine similarity...

In [36]:
boardgame =boardgame.set_index('ID',drop=True)

subset = boardgame.loc[:, similarity_cols]

cosine_matrix = cosine_similarity(subset)

cosine_sim_df = pd.DataFrame(cosine_matrix , index=boardgame.index, columns=boardgame.index)

# Behold, introducing the dazzling Game Whisperer v1.0 – Prepare for an extraordinary gaming experience!

In [37]:
def game_recommender(id,rec_num = 10):
    rec_game = pd.DataFrame()
    for game in id:
        temp_df = cosine_sim_df[game]
        temp_df.drop(game,inplace=True)
        rec_game = pd.concat([rec_game,temp_df],axis=0)
    rec_game.columns = ['similarity']
    rec_game.sort_values(by=['similarity'],ascending=False,inplace=True)
    rec_index = list(rec_game.index)
    for x in id:
        try:
            rec_index.remove(x)
        except:
            pass
    return boardgame.loc[rec_index].head(rec_num)


Let's say , i like Brass: Birmingham(id =224517) , Dune: Imperium (id= 316554) ,A Feast for Odin (id =177736) so much.

what games would you recommend us to play ?

In [46]:
game_recommender(['224517','316554','177736']).iloc[:,0:10]

Unnamed: 0_level_0,Name,Year Published,Min Players,Max Players,Play Time,Min Age,Users Rated,Rating Average,BGG Rank,Complexity Average
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
176980,Helionox: The Last Sunset,2015,1,4,60,13,265,6.92,5025,2.38
199042,Harry Potter: Hogwarts Battle,2016,2,4,60,11,11960,7.46,290,2.08
40531,Cosmic Encounter,2000,2,4,60,12,1275,6.41,3010,2.3
843,Circus Minimus,2000,2,7,60,12,214,6.31,7148,2.14
189453,Victorian Masterminds,2019,2,4,60,14,1005,6.93,2280,2.21
28720,Brass: Lancashire,2007,2,4,120,14,19400,8.17,19,3.86
159109,XenoShyft: Onslaught,2015,1,4,60,13,3336,6.96,1068,2.66
65901,Age of Industry,2010,2,5,120,13,2857,7.37,707,3.36
104575,Steam Torpedo: First Contact,2011,2,2,40,13,415,6.23,6038,2.43
202819,Gnomopolis,2018,1,4,45,10,305,7.52,4054,2.63
