# TIME SIGNATURE PREDICTION OF GIVEN SPOTIFY SONGS

In this repository, I built recommendation system for given 42305 Spotify songs based on their genre, mode, and duration. <br>
I used `sklearn.decomposition`' s NMF to did that, <br>
and I preprocessed the data by `normalize` method of `sklearn.preprocessing` <br>
I also replaced Na values with `sklearn.impute`s `SimpleImputer` <br>
The source of data is: https://www.kaggle.com/mrmorj/dataset-of-songs-in-spotify

#### IMPORTING NECESSARY LIBRARIES

In [1]:
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns


#### IMPORTING OUR DATASET

In [2]:
spoti = pd.read_csv("C:\\Users\\talfi\\python\\ML\\datasets\\dirty\\spoti\\genres_v2.csv", encoding='utf-8', quotechar='"')
spoti.head(3)

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0.1,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,...,id,uri,track_href,analysis_url,duration_ms,time_signature,genre,song_name,Unnamed: 0,title
0,0.831,0.814,2,-7.364,1,0.42,0.0598,0.0134,0.0556,0.389,...,2Vc6NJ9PW9gD9q343XFRKx,spotify:track:2Vc6NJ9PW9gD9q343XFRKx,https://api.spotify.com/v1/tracks/2Vc6NJ9PW9gD...,https://api.spotify.com/v1/audio-analysis/2Vc6...,124539,4,Dark Trap,Mercury: Retrograde,,
1,0.719,0.493,8,-7.23,1,0.0794,0.401,0.0,0.118,0.124,...,7pgJBLVz5VmnL7uGHmRj6p,spotify:track:7pgJBLVz5VmnL7uGHmRj6p,https://api.spotify.com/v1/tracks/7pgJBLVz5Vmn...,https://api.spotify.com/v1/audio-analysis/7pgJ...,224427,4,Dark Trap,Pathology,,
2,0.85,0.893,5,-4.783,1,0.0623,0.0138,4e-06,0.372,0.0391,...,0vSWgAlfpye0WCGeNmuNhy,spotify:track:0vSWgAlfpye0WCGeNmuNhy,https://api.spotify.com/v1/tracks/0vSWgAlfpye0...,https://api.spotify.com/v1/audio-analysis/0vSW...,98821,4,Dark Trap,Symbiote,,


In [3]:
spoti.shape

(42305, 22)

#### CREATING song_name 

- `song_name`  column will be our target variable in here. Let's create a pd series object that only contains `song_name` variable and name it as **song_name**

In [4]:
song_name = spoti["song_name"]

In [5]:
print(song_name.shape)
print(song_name.isnull().values.any())

(42305,)
True


In [6]:
song_name = song_name.values.reshape(-1,1)

In [7]:
song_name.shape

(42305, 1)

#### REPLACING NA VALUES

- Let's replace the Na values in our target variable(**song_name**) with the mode value.
- `strategy='most_frequent'` is the only possible option we can use for Obj dtypes in `SimpleImputer`

In [8]:
from sklearn.impute import SimpleImputer
imr = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
imr = imr.fit(song_name)
imputed_data = imr.transform(song_name)
song_name = pd.DataFrame(imputed_data)
song_name = song_name.rename(columns={0:"Song-Names"})
song_name.head(3)

Unnamed: 0,Song-Names
0,Mercury: Retrograde
1,Pathology
2,Symbiote


#### CREATING CORE
Let's create a DataFrame that contains genre, mode, and duration_ms info of the given song names. <br>
Let's name it as core.

In [9]:
core = spoti[["genre","mode","duration_ms"]]
print(core.dtypes)
print(core.head())

genre          object
mode            int64
duration_ms     int64
dtype: object
       genre  mode  duration_ms
0  Dark Trap     1       124539
1  Dark Trap     1       224427
2  Dark Trap     1        98821
3  Dark Trap     1       123661
4  Dark Trap     1       123298


In [10]:
core.dtypes

genre          object
mode            int64
duration_ms     int64
dtype: object

Hmm, looks like dtype of genre is object, and it contains string values. Let's encode them and make convinient for M.L. algorithm. 

In [14]:
core["genre"].value_counts()

Underground Rap    5875
Dark Trap          4578
Hiphop             3028
trance             2999
trap               2987
techhouse          2975
dnb                2966
psytrance          2961
techno             2956
hardstyle          2936
RnB                2099
Trap Metal         1956
Rap                1848
Emo                1680
Pop                 461
Name: genre, dtype: int64

In [15]:
core = core.replace({"genre":{"Underground Rap":0, "Dark Trap":1, "Hiphop":2, "trance":3, "trap":4, "techhouse":5, "dnb":6, "psytrance": 7, "techno":8, "hardstyle":9, "RnB":10, "Trap Metal":11, "Rap":12, "Emo":13, "Pop":14}})

In [16]:
from sklearn.decomposition import NMF
nmf = NMF(n_components = 6)
nmf_features = nmf.fit_transform(core)



#### PREPROCESSING

`normalize()` samples individually to unit norm. Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1, l2 or inf) equals one. This transformer is able to work both with dense numpy arrays and scipy.

In [17]:
from sklearn.preprocessing import normalize
norm_features = normalize(nmf_features)
current_music = norm_features[23,:]
similarities = norm_features.dot(current_music)
similarities

array([0.98028166, 0.98550881, 0.99565498, ..., 0.98838386, 0.98137231,
       0.97706737])

#### LAST TOUCHES AND CREATING current_music() function

In [18]:
df = pd.DataFrame(norm_features)
x = df.join(song_name)
df = pd.pivot_table(x, x[[0,1,2,3,4,5]],["Song-Names"])#for indexing song_name to our df
def current_music(value):
    print("Top 5 recommendations for given music are:")
    value = df.loc[value]
    similarities = df.dot(value)
    print(format(similarities.nlargest()))

In [19]:
current_music("Missed Calls - Remix")

Top 5 recommendations for given music are:
Song-Names
Missed Calls - Remix                                    1.000000
Uptown Funk (feat. Bruno Mars)                          0.999988
Gazzillion Ear (feat. Thom Yorke) - Thom Yorke Remix    0.999988
Ancestral                                               0.999987
Pathways                                                0.999981
dtype: float64
