![SpotifyxRVB](./Spotify%20x%20RVB%20-%20Logo.png)

# <a><center> Spotify x Reverberation Radio - The essence of Cool </center></a>
<b>Author: Ulysse Zampogna</b>

The project consists in revealing the secrets of the best radio on earth (according to me): Reverberation. This radio is a collective of artists based in Austin texas, regularly featuring guests apperarences from all over the world. The radio releases one show on a weekly basis. The music selection is excquisit and covers an eclectic range of music history. 

But what make this radio so special? Let's compare it with a few popular playlists on Spotify and try to define the Reverberation's essence of Cool. 

This Data Science project is broken down in 3 notebooks:
 - [Part 1](https://github.com/uzampogn/Spotify-x-Reverberation-The-essence-of-cool) uses [Spotipy API](https://spotipy.readthedocs.io/en/2.19.0/) to collect data from a dozen of playlists. 
 - [Part 2](https://github.com/uzampogn/Spotify-x-Reverberation-The-essence-of-cool) performs an exploratory data analysis to unveal the secrets of each playlist. 
 - [Part 3](https://github.com/uzampogn/Spotify-x-Reverberation-The-essence-of-cool) is a series of classifiers ([Naïve Bayes](https://scikit-learn.org/stable/modules/naive_bayes.html), [Random Forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) & [XGBoost](https://xgboost.readthedocs.io/en/stable/)) which takes any track and predicts the best matching playlist. Let's see if a given track could enter the exquisit Reverberation playlist.

#### Useful Resources
- https://spotipy.readthedocs.io/en/2.19.0/

---
# Part 1: Data Collection

This first kernel uses Spotipy API to collect the tracklists and tracks features of the Reveberation playlist as well as another dozen playlists for comparison purposes. These extra playlists are selected among the most popular playlists for 2021.

Let's collect this data!

### Table of contents

   1. <a href='#1'>Credentials settings</a>
   2. <a href='#2'>Playlist extraction</a>
   3. <a href='#3'>Track extraction</a>
   4. <a href='#4'>Track features extractions</a>
   5. <a href='#5'>Data export</a>
   6. <a href='#6'>Final checks</a>
   
 ---

#### Settings

In [149]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
import time

pd.set_option('display.max_rows', None)
%matplotlib inline
sns.set(style="ticks", context="poster")
plt.style.use("seaborn-poster")
%autosave 30

Autosaving every 30 seconds


---
### 1. Credentials settings <a name="1"></a>

In [2]:
client_id = #ENTER CLIENT_ID
client_secret = #ENTER CLIENT_SECRET_KEY

client_credentials_manager = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

---
### 2. Playlists extraction<a name='2'></a>

First, we will extract a dozen playlists among which Reveberbation. The other ones will serve as benchmark during EDA and negative samples for the classification algorithm.

In [3]:
#get all tracks for a given playlist
def get_playlist_tracks(username,playlist_id):
    results = sp.user_playlist_tracks(username,playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

In [4]:
#hand picked list of public playlist on Spotify
playlists = [['''Today's Top Hits''','spotify','37i9dQZF1DXcBWIGoYBM5M'],
 ['Your Favorite CoffeeHouse','spotify','37i9dQZF1DX6ziVCJnEm59'],
 ['RapCaviar','spotify','37i9dQZF1DX0XUsuxWHRQd'],
 ['Viva Latino','spotify','37i9dQZF1DX10zKzsJ2jva'],
 ['Hot Country','spotify','37i9dQZF1DX1lVhptIYRda'],
 ['New Music Friday','spotify','37i9dQZF1DX4JAvHpjipBk'],
 ['Peaceful Piano','spotify','37i9dQZF1DXcBWIGoYBM5M'],
 ['Are & Be','spotify','37i9dQZF1DX4SBhb3fqCJd'],
 ['Mint','spotify','37i9dQZF1DX4dyzvuaRJ0n'],
 ['Just Hits','spotify','37i9dQZF1DXcRXFNfZr7Tp'],
 ['Soft Pop Hits','spotify','37i9dQZF1DWTwnEm1IYyoj'],
 ['All Out 2000s','spotify','37i9dQZF1DX4o1oenSJRJd'],
 ['Reverberation','h0572wwbcwym536rga31czi84','4puKKbk5xNIpmtYUI2OYzt']]

In [5]:
#playlist extraction
pl_info = []
for pl in playlists:
    pl_name=pl[0]
    pl_track_list = get_playlist_tracks(pl[1],pl[2])
    pl_info.append([pl_name,pl_track_list])
    time.sleep(5)
    print('Playlist {} extracted'.format(pl_name))

Playlist Today's Top Hits extracted
Playlist Your Favorite CoffeeHouse extracted
Playlist RapCaviar extracted
Playlist Viva Latino extracted
Playlist Hot Country extracted
Playlist New Music Friday extracted
Playlist Peaceful Piano extracted
Playlist Are & Be extracted
Playlist Mint extracted
Playlist Just Hits extracted
Playlist Soft Pop Hits extracted
Playlist All Out 2000s extracted
Playlist Reverberation extracted


In [6]:
#Inspect first element
#pl_info[0][1][0]

---
### 3. Track_ids extraction <a name='2'></a>

Further processing is needed to extract the track_id out of each playlist item

In [7]:
#Instantiate function to extract track ids
def getTrackID(ids):
    id = []
    playlist = ids
    for item in playlist:
        id.append(item['track']['id'])
    return id

In [8]:
#Instantiate variables
pl_ids = []
tot_pl = 0
tot_ids = 0
positive_class = 0

#extract each track_id for each playlist
for pl in pl_info:
    ids = getTrackID(pl[1])
    pl_ids.append([pl[0],ids])
    tot_pl += 1
    tot_ids += len(ids)
    if pl[0]=='Reverberation':
        positive_class = len(ids)
    print('{} - # of collected ids: {}\n'.format(pl[0],len(ids)))
    time.sleep(5)
print('Total playlists collected: {}'.format(tot_pl))
print('Total ids collected: {}'.format(tot_ids))
print('Total positive class (Reverberation): {}, or {:.2%}'.format(positive_class,positive_class/tot_ids))
print('Total negative class (Other Playlist): {}'.format(tot_ids-positive_class))

Today's Top Hits - # of collected ids: 50

Your Favorite CoffeeHouse - # of collected ids: 125

RapCaviar - # of collected ids: 50

Viva Latino - # of collected ids: 50

Hot Country - # of collected ids: 52

New Music Friday - # of collected ids: 100

Peaceful Piano - # of collected ids: 50

Are & Be - # of collected ids: 50

Mint - # of collected ids: 100

Just Hits - # of collected ids: 87

Soft Pop Hits - # of collected ids: 100

All Out 2000s - # of collected ids: 150

Reverberation - # of collected ids: 1602

Total playlists collected: 13
Total ids collected: 2566
Total positive class (Reverberation): 1602, or 62.43%
Total negative class (Other Playlist): 964


---
### 4. Track features extraction <a name='4'></a>

Now that we have a clean list of track_id, we can use sp.track & sp.audio_features from spotipy to collect detailed information on each track.

In [9]:
#Instantiate function to extract track features
#Resource: https://developer.spotify.com/documentation/web-api/reference/#/
def getTrackFeatures(id,playlist):
    meta = sp.track(id)
    features = sp.audio_features(id)
    artist_id = meta['album']['artists'][0]['id']
    artist_features = sp.artist(artist_id)
    
    #metadata
    name = meta['name']
    album = meta['album']['name']
    artist = meta['album']['artists'][0]['name']
    genres = artist_features['genres']
    release_date = meta['album']['release_date']
    duration_ms = meta['duration_ms']
    popularity = meta['popularity']
    
    #feature from the data
    acousticness = features[0]['acousticness']
    danceability = features[0]['danceability']
    energy = features[0]['energy']
    valence = features[0]['valence']
    instrumentalness = features[0]['instrumentalness']
    liveness = features[0]['liveness']
    loudness = features[0]['loudness']
    speechiness = features[0]['speechiness']
    tempo = features[0]['tempo']
    time_signature = features[0]['time_signature']
    mode = features[0]['mode']
    type = features[0]['type']
    
    track = [playlist,id,name,album,artist,genres,release_date,popularity,acousticness,danceability,energy,valence,instrumentalness,liveness,loudness,speechiness,tempo,time_signature,duration_ms,mode,type]
    return track

In [10]:
#extract track features
tracks = []
for pl_i,pl in enumerate(pl_ids):
    playlist = pl[0]
    ids = pl[1]
    for i in range(len(ids)):
        track = getTrackFeatures(ids[i],playlist)
        tracks.append(track)
        time.sleep(.5)
    print('Playlist {} - {}: Features extraction completed'.format(pl_i,playlist))

Playlist 0 - Today's Top Hits: Features extraction completed
Playlist 1 - Your Favorite CoffeeHouse: Features extraction completed
Playlist 2 - RapCaviar: Features extraction completed
Playlist 3 - Viva Latino: Features extraction completed
Playlist 4 - Hot Country: Features extraction completed
Playlist 5 - New Music Friday: Features extraction completed
Playlist 6 - Peaceful Piano: Features extraction completed
Playlist 7 - Are & Be: Features extraction completed
Playlist 8 - Mint: Features extraction completed
Playlist 9 - Just Hits: Features extraction completed
Playlist 10 - Soft Pop Hits: Features extraction completed
Playlist 11 - All Out 2000s: Features extraction completed
Playlist 12 - Reverberation: Features extraction completed


---
### 5. Export Data <a name='5'></a>

In [11]:
#Export data in csv file
track_columns=['playlist','id','name','album','artist','genres','release_date','popularity','acousticness','danceability','energy','valence','instrumentalness','liveness','loudness','speechiness','tempo','time_signature','duration_ms','mode','type']
df = pd.DataFrame(tracks, columns=track_columns)
df.to_csv('./data/rvb_data.csv', sep=',',index=False)

---
### 6. Final check <a name='6'></a>

In [12]:
#Quick check df
print('Total duration in hours: {:.0f}\n'.format(df.duration_ms.sum()/(1000*60*60)))
print(df.playlist.unique().tolist())
df.head()

Total duration in hours: 158

["Today's Top Hits", 'Your Favorite CoffeeHouse', 'RapCaviar', 'Viva Latino', 'Hot Country', 'New Music Friday', 'Peaceful Piano', 'Are & Be', 'Mint', 'Just Hits', 'Soft Pop Hits', 'All Out 2000s', 'Reverberation']


Unnamed: 0,playlist,id,name,album,artist,genres,release_date,popularity,acousticness,danceability,...,valence,instrumentalness,liveness,loudness,speechiness,tempo,time_signature,duration_ms,mode,type
0,Today's Top Hits,3USxtqRwSYz57Ewm6wWRMp,Heat Waves,Dreamland,Glass Animals,"[gauze pop, indietronica, shiver pop]",2020-08-07,90,0.44,0.761,...,0.531,7e-06,0.0921,-6.9,0.0944,80.87,4,238805,1,audio_features
1,Today's Top Hits,5HCyWlXZPP0y6Gqq8TgA20,STAY (with Justin Bieber),STAY (with Justin Bieber),The Kid LAROI,[australian hip hop],2021-07-09,97,0.0383,0.591,...,0.478,0.0,0.103,-5.484,0.0483,169.928,4,141805,1,audio_features
2,Today's Top Hits,6zSpb8dQRaw0M1dK8PBwQz,Cold Heart - PNAU Remix,Cold Heart (PNAU Remix),Elton John,"[glam rock, mellow gold, piano rock]",2021-08-13,96,0.034,0.796,...,0.942,4.2e-05,0.0952,-6.312,0.0317,116.032,4,202735,1,audio_features
3,Today's Top Hits,0gplL1WMoJ6iYaPgMCL0gX,Easy On Me,Easy On Me,Adele,"[british soul, pop, pop soul, uk pop]",2021-10-14,97,0.578,0.604,...,0.13,0.0,0.133,-7.519,0.0282,141.981,4,224694,1,audio_features
4,Today's Top Hits,4fouWK6XVHhzl78KzQ1UjL,abcdefu,abcdefu,GAYLE,[modern alternative pop],2021-08-13,100,0.299,0.695,...,0.415,0.0,0.367,-5.692,0.0493,121.932,4,168601,1,audio_features


---
### Data collection part finished. 
### Next step see     [*Part 2 - Exploratory Data Analysis*](https://github.com/uzampogn/Spotify-x-Reverberation-The-essence-of-cool)