In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import os
import pandas as pd

In [2]:
import time

We opted to use Spotify's API in order to get the music. The keys used here are in our environmental variables.

In [3]:
cid = os.getenv('SPOTIFY_CLIENT_ID')
secret = os.getenv('SPOTIFY_SECRET')
ccm = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=ccm)

Since we need playlists of music representative of each genre, we decided to exclusively use Spotify's hand-picked playlists, each of 100 songs and each displayed with spotify's logo in the top right.

The names of the playlists chosen are as follows,

* Rock: "80's Rock Anthems"
* Classical: "Reading Classical"
* Hip Hop: "I Love My 90's Hip Hop"
* Punk: "Classic Punk"
* Jazz: "Jazz Classics"

In [4]:
#Playlist IDs

rock_80s = '37i9dQZF1DX1spT6G94GFC'
read_class = '37i9dQZF1DWYkztttC1w38'
hip_hop_90s = '37i9dQZF1DX186v583rmzp'
classic_punk = '37i9dQZF1DX3LDIBRoaCDQ'
classic_jazz = '37i9dQZF1DXbITWG1ZJKYt'

As for what features we'd get at this time, from each playlist we selected the features that gave some kind of description about the qualities of the music, such as duration and whether or not it was explicit. 

Also, we used the Spotify API's audio features object for each track to extract their additional features, such as key, energy, and time signature. 

Additionally, we took from each song (if available) the url of a 30 second preview, which will be the subject of further analysis later on.

Each of these were then put into a pandas dataframe.

In [5]:
def features(playlist_id):
    tracklist = sp.playlist(playlist_id)['tracks']['items']
    tracks_features = []
    for i in tracklist:
        t = i['track']
        track_id = t['id']
        duration_ms = t['duration_ms']
        explicit = t['explicit']
        popularity = t['popularity']
        preview = t['preview_url']
        af = sp.audio_features(track_id)[0]
        track_features = [track_id, duration_ms, explicit, popularity, af['danceability'],
                         af['energy'], af['key'], af['loudness'], af['mode'], af['speechiness'],
                         af['acousticness'], af['instrumentalness'], af['liveness'], 
                         af['valence'], af['tempo'], af['time_signature'], preview]
        tracks_features.append(track_features)
    df =  pd.DataFrame(data=tracks_features, columns=['ID', 'Duration', 'Explicit', 'Popularity',
                                                      'Danceability', 'Energy', 'Key', 'Loudness',
                                                      'Mode', 'Speechiness', 'Acousticness', 
                                                      'Instrumentalness', 'Liveness', 'Valence',
                                                      'Tempo', 'Time Signature', 'Preview'])
    return df

Using this function, we created five dataframes for each of the five genres we're analyzing

In [6]:
rock_df = features(rock_80s)
classical_df = features(read_class)
hiphop_df = features(hip_hop_90s)
punk_df = features(classic_punk)
jazz_df = features(classic_jazz)

In [7]:
rock_df.head()

Unnamed: 0,ID,Duration,Explicit,Popularity,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Time Signature,Preview
0,3rdxvEfBp86WNcRDLaFEk9,270160,False,12,0.39,0.898,7,-4.089,1,0.045,0.207,6.6e-05,0.217,0.204,90.703,4,
1,2SiXAy7TuUkycRVbbWDEpo,210173,False,79,0.532,0.767,7,-5.509,1,0.0574,0.00287,0.000513,0.39,0.755,127.361,4,https://p.scdn.co/mp3-preview/2a200f628c41fadd...
2,4MhTFsyqIJnjsOweVcU8ug,356400,False,17,0.449,0.901,6,-7.711,1,0.0526,0.141,0.0675,0.128,0.696,125.148,4,
3,7qQnBfwXrw2tZNFG4Uf57N,250626,False,12,0.52,0.887,0,-3.296,1,0.0349,0.0665,0.000179,0.186,0.793,122.528,4,
4,7N3PAbqfTjSEU1edb2tY8j,241599,False,78,0.572,0.835,0,-6.219,1,0.0317,0.171,0.000376,0.0702,0.796,129.994,4,https://p.scdn.co/mp3-preview/f6e554cadfb84a51...


With these dataframes created, we exported them to CSV files to be analyzed and cleaned

In [8]:
rock_df.to_csv('data/rock_df.csv')
classical_df.to_csv('data/classical_df.csv')
hiphop_df.to_csv('data/hiphop_df.csv')
punk_df.to_csv('data/punk_df.csv')
jazz_df.to_csv('data/jazz_df.csv')