# Recommendation Service Experiment

#### Goal
Create a microservice that can recommend 4-5 other students to a student tailored to them and one that would ensure that the resulting squad is a good fit.
First we consider what makes someone a "good" match. For this we will ask students to select a list of **Interests** (such as music, sports, fintech) and a list of 
**Skills** (such as C++, React, ML) A good squad can be considered to be a group of students with similar interests yet diverse skills.

#### Problem statement
A student can choose any number of interests or skills and we need to determine a way to measure similarity between two students. There can be many permutations of matching interests and varying skills but our algorithm needs to choose the relatively "top 5" ones.

#### Approach
Parse the list of interests/skills as feature vectors where presence of a category is represented by 1 and absense by 0. These will resemble a n-dimensional vectors that can be used to measure the angle between them. Smaller angle means more similarity and vice versa. Calculating the angle isn't necessary, in fact a dot product can be used to represent the same thing.

In [2]:
#The following example just shows getting top 10 most similar students based on matching interests
import pandas as pd
import numpy as np

# Pull 300 random seeded user data
dataset = pd.read_csv("data_v2.csv")
dataset.head()

Unnamed: 0,name,webdev,ai,robotics,nlp,bot,fintech,music,iot,biomedical,education,environment,fitness,vr,blockchain,gaming,hardware,research
0,Emma,0,0,1,0,1,0,0,1,0,0,0,1,1,1,0,1,0
1,Noah,0,1,1,1,1,0,0,0,1,0,1,1,0,1,0,1,0
2,William,0,1,1,1,1,0,1,1,1,0,1,0,0,1,1,0,1
3,James,1,1,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0
4,Oliver,0,1,1,0,0,1,1,0,0,0,1,1,0,0,0,0,0


In [3]:
# Each row represents a student with boolean vals representing which interest they have (1) or not (0)
dataset.iloc[0]

name           Emma
webdev            0
ai                0
robotics          1
nlp               0
bot               1
fintech           0
music             0
iot               1
biomedical        0
education         0
environment       0
fitness           1
vr                1
blockchain        1
gaming            0
hardware          1
research          0
Name: 0, dtype: object

In [4]:
emma_np = dataset.iloc[0].to_numpy().copy()[1:,] # We need to convert the DataFrame to a numpy arr and also get a feature vec

In [5]:
# Determine total num of interests Emma does have
totalInterests = len(emma_np[emma_np == 1])
totalInterests

7

In [6]:
%%time
# Running through around 300 users we get a similarity score between their interest and our target user's in this
# case: Emma. From 0 - 100, higher score means greater similarity.
def getScore(userTuple):
    return userTuple[1]

match = []
score_list = []
for index, row in dataset.iterrows():
    interest_vec = row.to_numpy()[1:,]
    sim_score = np.dot(emma_np, interest_vec) / totalInterests * 100
    score_list.append((row['name'], sim_score))
    if sim_score == 100.0:
        match.append(index)
#     print(row['name'], ': ', sim_score)

score_list.sort(key=getScore, reverse=True)
recommendation_result = score_list[0:11] # Top 10 most similar students + Emma herself

Wall time: 53 ms


In [7]:
# The following are students who matched most with Emma's interests (including Emma obviously)
recommendation_result

[('Emma', 100.0),
 ('Jude', 100.0),
 ('Maximiliano', 100.0),
 ('Daxton', 100.0),
 ('Grace', 85.71428571428571),
 ('Juan', 85.71428571428571),
 ('Bryce', 85.71428571428571),
 ('Andres', 85.71428571428571),
 ('Karson', 85.71428571428571),
 ('Noah', 71.42857142857143),
 ('Daniel', 71.42857142857143)]

In [10]:
# Looking up the op 3 matches a bit closely we can see the algo selected users with interests that matched (not necessarily exactly) with Emma
dataset.loc[(dataset['name'] == 'Emma') | (dataset['name'] == 'Jude') | (dataset['name'] == 'Maximiliano') | (dataset['name'] == 'Daxton')]

Unnamed: 0,name,webdev,ai,robotics,nlp,bot,fintech,music,iot,biomedical,education,environment,fitness,vr,blockchain,gaming,hardware,research
0,Emma,0,0,1,0,1,0,0,1,0,0,0,1,1,1,0,1,0
148,Jude,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,0
258,Maximiliano,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1
287,Daxton,1,0,1,1,1,0,1,1,1,0,0,1,1,1,0,1,0
