# Accessing Spotify API Data

In order to further explore my listening data, I wanted to access the audio features of each track that I had listened to in the past year. To do so, I needed to format a GET request to the Spotify Web API. By using a Python library entitled "Spotipy" that works directly with the web API, I was able to obtain the audio features for my top 1500 songs from the past year.

In [2]:
import spotipy
from spotipy.oauth2 import SpotifyOAuth

import numpy as np
import pandas as pd
import seaborn as sns
import json
import time
import matplotlib.pyplot as plt
%matplotlib inline

plt.style.use('ggplot')

In [8]:
CLIENT_USERNAME = "willfurtado"
SPOTIPY_CLIENT_ID = "0c58b8f377294e1393b6ff20d1db34fc"
SPOTIPY_CLIENT_SECRET = "12fb3865a39343aba75ec4b118f6adf9"
SPOTIPY_REDIRECT_URI = "https://localhost:8888"

PODCAST_ARTISTS = ['VIEWS with David Dobrik and Jason Nash', 'The California Golden Bearcast', 
                 'Whiskey Ginger w/ Andrew Santino', 'The Tiny Meat Gang Podcast',
                 'Stuff You Should Know','Patriots Unfiltered','Cal Rivals Excellent Podcast Experience',
                 'Curious with Josh Peck','Locked On Patriots - Daily Podcast On The New England Patriots',
                 'Skotcast with Jeff Wittek & Scotty Sire','Anything Goes with Emma Chamberlain',
                 'Call Her Daddy', 'Office Ladies', 'That Made All the Difference','Pardon My Take', 
                  'My Favorite Theorem', 'The James Altucher Show', 'Zane and Heath: Unfiltered',
                   'With Authority','The Numberphile Podcast', 'Billionaires Getting Interviewed',
                  'Elon Musk Interviews','Cover 3 College Football Podcast']

WHITE_NOISE = ['Nature Sounds', 'Sounds Of Nature : Thunderstorm, Rain','Calmsound']

DIFFICULT_SONGS = {'I Know (feat. Mick Jenkins)': 'I Know Mick Jenkins', 
                   'Take Me Home, Country Roads - Rerecorded': 'Take Me Home Country Roads',
                  'Chica Paranormal - Verdun Remix': 'Chica Paranormal'}

UNSEARCHABLE_SONGS = {'!!!!!!!':'0rQtoQXQfwpDW0c7Fw1NeM'}

In [9]:
scope = "user-library-read"

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope, 
    client_id=SPOTIPY_CLIENT_ID, 
    client_secret=SPOTIPY_CLIENT_SECRET,
    username='willfurtado',
    redirect_uri=SPOTIPY_REDIRECT_URI))

In [11]:
with open('personal_data/StreamingHistory0.json') as file:
    data = json.load(file)
df0 = pd.DataFrame(data)

with open('personal_data/StreamingHistory1.json') as file:
    data1 = json.load(file)
df1 = pd.DataFrame(data1)

df = df0.append(df1, ignore_index=True)

df['secPlayed'] = round(df['msPlayed'] / 1000, 1)
df = df.drop(columns=['msPlayed'])
STRTIME_FORMAT = '%Y-%m-%d %H:%M'
df['endTime'] = pd.to_datetime(df['endTime'], format=STRTIME_FORMAT)

music = df[~df['artistName'].isin(PODCAST_ARTISTS + WHITE_NOISE)].reset_index(drop=True)
podcasts = df[df['artistName'].isin(PODCAST_ARTISTS)].reset_index(drop=True)

In [12]:
def get_spotify_uri(song):
    """Returns the corresponding spotify URI from a given song title"""
    if song in DIFFICULT_SONGS:
        search_results = sp.search(q=DIFFICULT_SONGS[song], type='track', limit=1)
    elif song in UNSEARCHABLE_SONGS:
        return UNSEARCHABLE_SONGS[song]
    else:
        search_results = sp.search(q=song, type='track', limit=1)
    try:
        return search_results['tracks']['items'][0]['id']
    except (AttributeError, IndexError) as err:
        print('No results for {}'.format(song))
        pass
            
unique_songs = music.groupby('trackName').count().sort_values('secPlayed', ascending=False).index

column_labels = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 
                 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 
                 'analysis_url', 'duration_ms', 'time_signature']

features_df = pd.DataFrame(columns=column_labels)

In [13]:
%%time

for song in unique_songs:
    uri = get_spotify_uri(song)
    features = sp.audio_features([uri])[0]
    dataframe = pd.DataFrame(data=features, index=[song])
    features_df = features_df.append(dataframe, sort=False)

KeyboardInterrupt: 

In [49]:
features_df.to_csv('./features/features_750.csv', index=True)