# KNN Recommender 

👉 K-Nearest-Neighbors (KNN) models can be used to model and make predictions, but they can alternatively be utilized to find the closest points in a dataset.  

👨🏻‍🏫 In this recap, we will use a KNN model to create a basic music recommender system.

In [1]:
import pandas as pd

url = 'https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_spotify_data.csv'

# Using pandas, load the data from the provided URL
df = pd.read_csv(url)
df.head()

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
0,We're For The Dark - Remastered 2010,['Badfinger'],22,0.678,0.559,0.432,0,3,0.0727,-12.696,0.0334,117.674
1,Sixty Years On - Piano Demo,['Elton John'],25,0.456,0.259,0.368,0,6,0.156,-10.692,0.028,143.783
2,Got to Find Another Way,['The Guess Who'],21,0.433,0.833,0.724,0,0,0.17,-9.803,0.0378,84.341
3,Feelin' Alright - Live At The Fillmore East/1970,['Joe Cocker'],22,0.436,0.87,0.914,0,5,0.855,-6.955,0.061,174.005
4,Caravan - Take 7,['Van Morrison'],23,0.669,0.564,0.412,0,7,0.401,-13.095,0.0679,78.716


🎯 Let's find songs that are "similar" to Queen's mythical *Another One Bites the Dust*.

In [2]:
df[df.artists == "['Eminem']"]

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
5688,My Name Is - Instrumental,['Eminem'],41,0.797,0.398,0.427,0,0,0.0699,-6.799,0.133,171.056
5750,Just Don't Give A Fuck - Instrumental,['Eminem'],41,0.642,0.54,0.585,0,4,0.208,-6.551,0.0568,85.419
5784,Just Don't Give A Fuck - A Cappella,['Eminem'],40,0.634,0.814,0.476,1,7,0.135,-9.517,0.556,81.315
6607,Curtains Up,['Eminem'],44,0.306,0.0269,0.724,1,9,0.935,-18.799,0.0897,114.923
6849,The Real Slim Shady,['Eminem'],41,0.937,0.771,0.615,0,5,0.0602,-6.521,0.061,104.482
6946,Criminal,['Eminem'],44,0.804,0.554,0.724,1,11,0.287,-6.161,0.271,101.951
7932,Not Afraid,['Eminem'],41,0.852,0.65,0.955,0,0,0.356,-1.206,0.275,114.64
8403,So Much Better,['Eminem'],53,0.719,0.616,0.858,1,10,0.628,-1.366,0.25,84.497
8518,Evil Twin,['Eminem'],51,0.648,0.783,0.91,1,4,0.0834,-4.762,0.363,83.091
9265,Believe,['Eminem'],61,0.884,0.11,0.492,1,1,0.359,-5.343,0.112,130.072


In [3]:
queen_song = df.iloc[4295:4296] # Another One Bites the Dust - Queen

queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


## 1. Calculating the distances

👇 First, train the KNN to have it learn the distances between each observation of the dataset.  
Since we are only concerned with the similarity of features between the songs, it doesn't matter which target the model is fitted on.

In [4]:
df.dtypes

name             object
artists          object
popularity        int64
danceability    float64
valence         float64
energy          float64
explicit          int64
key               int64
liveness        float64
loudness        float64
speechiness     float64
tempo           float64
dtype: object

In [5]:
# Scale the data
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsRegressor, NearestNeighbors

X = df.drop(columns=['name', 'artists'])

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Train a KNN model with your data

# SUPERVISED LEARNING (we need a target)
# Using a KNeighborsRegressor on a dummy target (e.g. popularity)
y = df.popularity
supervised_model = KNeighborsRegressor()
supervised_model.fit(X_scaled, y)

# UNSUPERVISED LEARNING (there is no target) => BETTER (because we don't care abou tthe target)
# Using the NearestNeighbors unsupervised learner
unsupervised_model = NearestNeighbors()
unsupervised_model.fit(X_scaled)

Check out the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor.kneighbors)

## 2. Passing the new point

👇 You can now pass a new point to the KNN model and find its closest point.

In [6]:
queen_song

Unnamed: 0,name,artists,popularity,danceability,valence,energy,explicit,key,liveness,loudness,speechiness,tempo
4295,Another One Bites The Dust - Live at Wembley '86,['Queen'],29,0.534,0.114,0.984,0,4,0.982,-5.058,0.297,115.991


In [7]:
# First, clean up and scale the queen_song
queen_song_clean = queen_song.drop(columns=['name', 'artists'])
scaled_queen_song = scaler.transform(queen_song_clean)

# Find the closest neighbors to the queen_song
distances, indices = unsupervised_model.kneighbors(X=scaled_queen_song, n_neighbors=3)
similar_songs = pd.DataFrame({ "distance": distances[0], "indices": indices[0] }).set_index('indices')
similar_songs

Unnamed: 0_level_0,distance
indices,Unnamed: 1_level_1
4295,0.0
1164,0.359992
1761,0.367962


In [8]:
# Loading the songs name and artists
similar_songs_info = df.loc[indices[0], ['artists', 'name']]
# Merging with the distance to see everything in one df
similar_songs_info.merge(similar_songs, left_index=True, right_index=True)

Unnamed: 0,artists,name,distance
4295,['Queen'],Another One Bites The Dust - Live at Wembley '86,0.0
1164,['Wings'],"Hi, Hi, Hi - Live / Remastered",0.359992
1761,['The Damned'],Liar,0.367962


## 3. Making a playlist!

👇 Make a playlist with 10 songs based on Queen's *Another One Bites the Dust*, sorted by increasing tempo.

In [9]:
scaled_queen_song

array([[0.3372093 , 0.54158215, 0.11445783, 0.984     , 0.        ,
        0.36363636, 0.982     , 0.89961194, 0.31034483, 0.51680873]])

In [10]:
distances, indices = unsupervised_model.kneighbors(X=scaled_queen_song, n_neighbors=10)

In [11]:
playlist = pd.DataFrame({ "distance": distances[0], "indices": indices[0] }).set_index('indices')
playlist_info = df.loc[indices[0], ['artists', 'name']]
playlist_info.merge(playlist, left_index=True, right_index=True)

Unnamed: 0,artists,name,distance
4295,['Queen'],Another One Bites The Dust - Live at Wembley '86,0.0
1164,['Wings'],"Hi, Hi, Hi - Live / Remastered",0.359992
1761,['The Damned'],Liar,0.367962
8607,['Nitro Fun'],Cheat Codes,0.369083
704,['Bob Dylan'],"It Ain't Me, Babe - Live at LA Forum, Inglewoo...",0.378462
1211,['Rainbow'],A Light In The Black,0.397099
3307,['Butthole Surfers'],Graveyard,0.406598
2233,['Rush'],YYZ - Live In Canada / 1980,0.408352
2705,['U2'],A Sort Of Homecoming - Live,0.422604
1614,['Cheap Trick'],"Clock Strikes Ten - Live at Nippon Budokan, To...",0.427053
