# Music Recommendation System 

#### Dataset Source:
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 
The Million Song Dataset. In Proceedings of the 12th International Society
for Music Information Retrieval Conference (ISMIR 2011), 2011.

In [1]:
# importing the required libraries
from tqdm import tqdm
import pandas as pd
import warnings
from IPython.display import clear_output
warnings.filterwarnings("ignore")

In [2]:
#Reading the triplets which contain the song id, user id, and listening count
triplets = pd.read_csv(r'msdchallenge/kaggle_visible_evaluation_triplets.txt', sep = "\t", header = None)
triplets.columns = ['user_id','song_id','listen_count']
triplets

Unnamed: 0,user_id,song_id,listen_count
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1
...,...,...,...
1450928,5e650759ebf89012044c6d52121eeada8b0ec814,SOVLNXV12A6D4F706E,1
1450929,5e650759ebf89012044c6d52121eeada8b0ec814,SOVDSJC12A58A7A271,2
1450930,5e650759ebf89012044c6d52121eeada8b0ec814,SOBRHVR12A8C133F35,2
1450931,5e650759ebf89012044c6d52121eeada8b0ec814,SOMGVYU12A8C1314FF,2


In [3]:
#Reading the songs metadata 
songs_metadata = pd.read_csv(r'msdchallenge/unique_tracks.txt',sep='<SEP>', header = None, engine = 'python')
songs_metadata.columns = ['track_id', 'song_id', 'artist_name', 'song_title']
songs_metadata

Unnamed: 0,track_id,song_id,artist_name,song_title
0,TRMMMYQ128F932D901,SOQMMHC12AB0180CB8,Faster Pussy cat,Silent Night
1,TRMMMKD128F425225D,SOVFVAK12A8C1350D9,Karkkiautomaatti,Tanssi vaan
2,TRMMMRX128F93187D9,SOGTUKN12AB017F4F1,Hudson Mohawke,No One Could Ever
3,TRMMMCH128F425532C,SOBNYVR12A8C13558C,Yerba Brava,Si Vos Querés
4,TRMMMWA128F426B589,SOHSBXH12A8C13B0DF,Der Mystic,Tangle Of Aspens
...,...,...,...,...
999995,TRYYYUS12903CD2DF0,SOTXAME12AB018F136,Kiko Navarro,O Samba Da Vida
999996,TRYYYJO128F426DA37,SOXQYIQ12A8C137FBB,Kuldeep Manak,Jago Chhadeo
999997,TRYYYMG128F4260ECA,SOHODZI12A8C137BB3,Gabriel Le Mar,Novemba
999998,TRYYYDJ128F9310A21,SOLXGOR12A81C21EB7,Elude,Faraday


In [4]:
#Merging the two dataframes to create the input for the recommendation system
songs_df = pd.merge(triplets, songs_metadata.drop_duplicates(['song_id']), on = "song_id", how = "left")
songs_df.head()

Unnamed: 0,user_id,song_id,listen_count,track_id,artist_name,song_title
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1,TRAEHHJ12903CF492F,Dwight Yoakam,You're The One
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1,TRLGMFJ128F4217DBE,Barry Tuckwell/Academy of St Martin-in-the-Fie...,Horn Concerto No. 4 in E flat K495: II. Romanc...
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1,TRTNDNE128F1486812,Cartola,Tive Sim
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1,TRASTUE128F930D488,Lonnie Gordon,Catch You Baby (Steve Pitron & Max Sanna Radio...
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1,TRFPLWO128F1486B9E,Miguel Calo,El Cuatrero


In [5]:
# Deleting the triplets and songs_metadata dataframes which are no longer used
del triplets
del songs_metadata

In [6]:
print("Total no of user-song combinations:",len(songs_df))

Total no of user-song combinations: 1450933


In [7]:
#Grouping songs by song_id 
song_grouped = pd.DataFrame(songs_df.groupby('song_id')['listen_count'].count())
song_grouped

Unnamed: 0_level_0,listen_count
song_id,Unnamed: 1_level_1
SOAAAFI12A6D4F9C66,2
SOAAAGK12AB0189572,1
SOAAAGQ12A8C1420C8,33
SOAAAMT12AB018C9C4,1
SOAAAQN12AB01856D3,2
...,...
SOZZZHM12A8C140DEF,8
SOZZZKJ12A6D4FBF66,1
SOZZZPV12A8C1444B5,45
SOZZZRV12A8C1361F1,1


In [8]:
#Merging the song title and artist name into a single column
songs_df['song'] = songs_df['song_title'].map(str) + " - " + songs_df['artist_name']
songs_df

Unnamed: 0,user_id,song_id,listen_count,track_id,artist_name,song_title,song
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1,TRAEHHJ12903CF492F,Dwight Yoakam,You're The One,You're The One - Dwight Yoakam
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1,TRLGMFJ128F4217DBE,Barry Tuckwell/Academy of St Martin-in-the-Fie...,Horn Concerto No. 4 in E flat K495: II. Romanc...,Horn Concerto No. 4 in E flat K495: II. Romanc...
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1,TRTNDNE128F1486812,Cartola,Tive Sim,Tive Sim - Cartola
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1,TRASTUE128F930D488,Lonnie Gordon,Catch You Baby (Steve Pitron & Max Sanna Radio...,Catch You Baby (Steve Pitron & Max Sanna Radio...
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1,TRFPLWO128F1486B9E,Miguel Calo,El Cuatrero,El Cuatrero - Miguel Calo
...,...,...,...,...,...,...,...
1450928,5e650759ebf89012044c6d52121eeada8b0ec814,SOVLNXV12A6D4F706E,1,TRPLXFD128F1454961,Mos Def,Ms. Fat Booty,Ms. Fat Booty - Mos Def
1450929,5e650759ebf89012044c6d52121eeada8b0ec814,SOVDSJC12A58A7A271,2,TRRBUQL12903CCE501,Sam Cooke,Ain't Misbehavin,Ain't Misbehavin - Sam Cooke
1450930,5e650759ebf89012044c6d52121eeada8b0ec814,SOBRHVR12A8C133F35,2,TRHNKAU128F9300856,Southside Spinners,Luvstruck,Luvstruck - Southside Spinners
1450931,5e650759ebf89012044c6d52121eeada8b0ec814,SOMGVYU12A8C1314FF,2,TRDULYN128F4248762,J. Karjalainen & Mustat Lasit,Sinisten tähtien alla,Sinisten tähtien alla - J. Karjalainen & Musta...


In [9]:
#Dropping the columns which are not required
songs_df.drop('song_title', axis=1,inplace=True)
songs_df.drop('artist_name', axis=1,inplace=True)
songs_df.drop('track_id', axis=1,inplace=True)
songs_df

Unnamed: 0,user_id,song_id,listen_count,song
0,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOBONKR12A58A7A7E0,1,You're The One - Dwight Yoakam
1,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOEGIYH12A6D4FC0E3,1,Horn Concerto No. 4 in E flat K495: II. Romanc...
2,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOFLJQZ12A6D4FADA6,1,Tive Sim - Cartola
3,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SOHTKMO12AB01843B0,1,Catch You Baby (Steve Pitron & Max Sanna Radio...
4,fd50c4007b68a3737fe052d5a4f78ce8aa117f3d,SODQZCY12A6D4F9D11,1,El Cuatrero - Miguel Calo
...,...,...,...,...
1450928,5e650759ebf89012044c6d52121eeada8b0ec814,SOVLNXV12A6D4F706E,1,Ms. Fat Booty - Mos Def
1450929,5e650759ebf89012044c6d52121eeada8b0ec814,SOVDSJC12A58A7A271,2,Ain't Misbehavin - Sam Cooke
1450930,5e650759ebf89012044c6d52121eeada8b0ec814,SOBRHVR12A8C133F35,2,Luvstruck - Southside Spinners
1450931,5e650759ebf89012044c6d52121eeada8b0ec814,SOMGVYU12A8C1314FF,2,Sinisten tähtien alla - J. Karjalainen & Musta...


In [10]:
songs_grouped_df = songs_df.groupby('song_id')['listen_count'].sum()
index_values = songs_grouped_df[songs_grouped_df <=150].index

In [11]:
songs_df = songs_df[songs_df.song_id.isin(index_values) == False]

In [12]:
songs_df = songs_df[songs_df.listen_count>=2]
songs_df

Unnamed: 0,user_id,song_id,listen_count,song
40,841b2394ae3a9febbd6b06497b4a8ee8eb24b7f8,SOXPJVO12A6D4FCC69,2,I Heard It's The Softest Thing Ever (Album Ver...
41,841b2394ae3a9febbd6b06497b4a8ee8eb24b7f8,SOQMFWG12AB0186AD8,2,A Party Song (The Walk of Shame) - All Time Low
48,841b2394ae3a9febbd6b06497b4a8ee8eb24b7f8,SOEIKRK12AB017D6E0,2,Have Faith In Me - A Day To Remember
60,91b8fac7dc5e03f6cfaf6e2aa7171f14a8354d62,SOMCWAZ12A67ADBCE3,2,In The Waiting Line - Zero 7
62,91b8fac7dc5e03f6cfaf6e2aa7171f14a8354d62,SOUDGEV12A8C135FC9,10,Big Yellow Taxi - Counting Crows / Vanessa Car...
...,...,...,...,...
1450927,5e650759ebf89012044c6d52121eeada8b0ec814,SOPUCYA12A8C13A694,3,Canada - Five Iron Frenzy
1450929,5e650759ebf89012044c6d52121eeada8b0ec814,SOVDSJC12A58A7A271,2,Ain't Misbehavin - Sam Cooke
1450930,5e650759ebf89012044c6d52121eeada8b0ec814,SOBRHVR12A8C133F35,2,Luvstruck - Southside Spinners
1450931,5e650759ebf89012044c6d52121eeada8b0ec814,SOMGVYU12A8C1314FF,2,Sinisten tähtien alla - J. Karjalainen & Musta...


In [13]:
songs_df.astype({'listen_count': 'int32'},{'song_id':'str'}).dtypes

user_id         object
song_id         object
listen_count     int32
song            object
dtype: object

In [14]:
chunk_size = 20000
chunks = [x for x in range(0, songs_df.shape[0], chunk_size)]

In [15]:
for i in range(0, len(chunks) - 1):
    print(chunks[i], chunks[i + 1] - 1)

0 19999
20000 39999
40000 59999
60000 79999
80000 99999
100000 119999
120000 139999
140000 159999
160000 179999
180000 199999
200000 219999
220000 239999
240000 259999
260000 279999
280000 299999


In [16]:
chunk_df = songs_df.iloc[ chunks[0]:chunks[1] - 1]
pivot_df = pd.pivot_table(chunk_df, index = 'user_id', values = 'listen_count', columns = 'song_id')
for i in tqdm(range(1, len(chunks) - 1)):
    chunk_df = songs_df.iloc[ chunks[i]:chunks[i + 1] - 1]
    pivot_chunk = pd.pivot_table(chunk_df, index = 'user_id', values = 'listen_count', columns = 'song_id') 
    pivot_df = pd.concat([pivot_chunk,pivot_df])

100%|███████████████████████████████████████████| 14/14 [00:33<00:00,  2.41s/it]


In [17]:
pivot_df

song_id,SOAACSG12AB018DC80,SOAAFYH12A8C13717A,SOAAGYY12A6D4F705E,SOAAKPM12A58A77210,SOAAROC12A6D4FA420,SOAATLI12A8C13E319,SOAAVUV12AB0186646,SOAAWEE12A6D4FBEC8,SOABHYV12A6D4F6D0F,SOABJBU12A8C13F63F,...,SODAMPH12A6D4F84F6,SOSNWNR12A6D4FDAF8,SOEZTQZ12A6D4F5820,SOGKTNS12A58A7C887,SOACTJP12A67ADE925,SOLEKZN12AF72A5261,SOQHDAV12A670206FE,SOFNMRS12AB017B420,SOHPOMX12A58A7D506,SOLZPLS12A6D4F8C0D
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
000b22f91d4992dba3a80025493059f972a7850e,,,,,,,,,,,...,,,,,,,,,,
00209c99d83b405d47fe87f6761dbf7d259ca856,,,,,,,,,,,...,,,,,,,,,,
002598ddfaf779558c8a2ca04cba63bce8544711,,,,,,,,,,,...,,,,,,,,,,
0027bd60fea07d48fa336a979f9fa439bebb44fb,,,,,,,,,,,...,,,,,,,,,,
002d1f1a39282b2875c08587379d9acf41866d57,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ff9e31adc2f001172ca982fcc4fcca07ec04370c,,,,,,,,,,,...,,,,,,,,,,
ffb046d6935a3b5ad4673d45ae277630ca03ddc6,,,,,,,,,,,...,,,,,,,,,,
ffc683d098d818421f22363b150a19fd7d307764,,,,,,,,,,,...,,,,,,,,,,
ffca2f5069c381b20171dc14404b2e5e1141692b,,,,,,,,,,,...,,,,,,,,,,


In [18]:
def music_recommender():
    try:
        print("Welcome to the Music Recommendation System\n")
        song_listened = input("Enter the song to get recommendations for : ")
        recommendation_count = int(input("Enter the number of recommendations you want (1-10) : "))
        print("Hold on...Fetching your recommendations")
        song_listened_id = songs_df.loc[songs_df['song'] == song_listened, 'song_id'].iloc[0]
        predictor_song_ratings = pivot_df[song_listened_id]
        predictor_song_ratings[predictor_song_ratings>= 1]
        similar_songs = pivot_df.corrwith(predictor_song_ratings)
        corr_listened_song = pd.DataFrame(similar_songs, columns = ['pearsonR'])
        corr_listened_song.dropna(inplace = True)
        predictor_corr_summary =corr_listened_song.join(song_grouped['listen_count'])
        predictor_corr_summary = predictor_corr_summary.sort_values('pearsonR', ascending = False)
        final_recommended_songs = predictor_corr_summary[predictor_corr_summary.pearsonR < 0.9999]
        final_recommended_songs.sort_values('pearsonR', ascending = False)
        final_recommended_songs = final_recommended_songs.reset_index()
        song_df_one = songs_df.drop(['listen_count'], axis=1)
        similar_songs = pd.merge(final_recommended_songs, song_df_one.drop_duplicates(["song_id"]), on="song_id", how="left")
        similar_songs = similar_songs.sort_values('pearsonR', ascending = False)
        print("Here are your %d recommendations" %recommendation_count )
        return similar_songs.head(recommendation_count)
        #return similar_songs.head(recommendation_count).to_excel("output.xlsx")
    except(IndexError):
        retry_recommender = input("No recommendations found. Please check the song name you have entered. Try again? (y/n)")
        if(retry_recommender == 'y'):
            clear_output()
            return music_recommender()
        else:
            print("See you soon")
    except(ValueError):
        retry_recommender = input("Please enter a number from 1-10. Try again? (y/n)")
        if(retry_recommender == 'y'):
            clear_output()
            return music_recommender()
        else:
            print("See you soon")

In [19]:
music_recommender()

Welcome to the Music Recommendation System

Enter the song to get recommendations for : Makes Me Wonder - Maroon 5
Enter the number of recommendations you want (1-10) : 10
Hold on...Fetching your recommendations
Here are your 10 recommendations


Unnamed: 0,song_id,pearsonR,listen_count,user_id,song
0,SOLFXKT12AB017E3E0,0.997609,2725,d54fbfb9fa57462a9326cef2c97694f58ae3d295,Fireflies - Charttraxx Karaoke
1,SOOXLKF12A6D4F594A,0.996514,134,235bc8aaa16dc62f71575768156df7ed4a81e5f3,Harder To Breathe - Maroon 5
2,SOWKQYL12AB0183B15,0.996217,856,07ea22de4e9c0bb1c9e0a428367874742bf838f8,Whatcha Say - Jason Derulo
3,SOFRQTD12A81C233C0,0.99021,5043,c732f882aa8d6db3bfaf8037d6418f27d3e07fc8,Sehr kosmisch - Harmonia
4,SOUSMXX12AB0185C24,0.989743,2260,da7bc0ec91a21a54f0b209bcc9ec5b4b49613a68,OMG - Usher featuring will.i.am
5,SOBADEB12AB018275F,0.981981,954,5efe1a751136b6a8bf5147a80f8e98917b9c82be,Imma Be - Black Eyed Peas
6,SOQDMKM12A58A7A9B2,0.979864,61,2173349af60cca7bdede34a2f30a2a6282f0cfaf,Nothing Lasts Forever - Maroon 5
7,SOURJIK12A8C138182,0.973247,75,951bf65709ec6ad34ed960fb5f532eaefba25f5e,Not Falling Apart - Maroon 5
8,SOSBGZJ12A8C134285,0.958948,94,699f1bc6550e275499cbaaed4b936b0259ab549d,Better That We Break - Maroon 5
9,SOPXFGP12A8C13FA9F,0.92968,150,07b6f3bb448e3ce6a7fb310fc945452f42015693,Sunday Morning - Maroon 5


## Sample songs to check recommendations for:

Makes Me Wonder - Maroon 5

Imma Be - Black Eyed Peas

Crawling (Album Version) - Linkin Park

Hey_ Soul Sister - Train

You Belong With Me - Taylor Swift