# Accessing Feature Data via Spotify API

In order to explore my listening data at a finer granularity, I wanted to access the audio features of each track that I had listened to in the past year. To do so, I needed to format a GET request to the Spotify Web API. By using a Python library, `spotipy`, I am able to access Spotify's API via Python scripts.

The following notebook shows the general workflow for obtaining the features of a given track.

## Dependencies and Utility Functions

In [None]:
import json
import time
import spotipy
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


from spotipy.oauth2 import SpotifyOAuth
from credentials import SPOTIPY_CLIENT_ID, SPOTIPY_CLIENT_SECRET, SPOTIPY_REDIRECT_URI

In [None]:
def get_spotify_uri(song):
    """Returns the corresponding spotify URI from a given song title
    
    Parameters:
        song (str): The human-readable name of a given Spotify track
    
    Returns:
        (str): The URI for the input track
    """
    if song in DIFFICULT_SONGS:
        search_results = sp.search(q=DIFFICULT_SONGS[song], type='track', limit=1)
    elif song in UNSEARCHABLE_SONGS:
        return UNSEARCHABLE_SONGS[song]
    else:
        search_results = sp.search(q=song, type='track', limit=1)
    try:
        return search_results['tracks']['items'][0]['id']
    except (AttributeError, IndexError) as err:
        print(f'No results for {song}')
        return

In [None]:
PODCAST_ARTISTS = ['VIEWS with David Dobrik and Jason Nash', 'The California Golden Bearcast', 
                 'Whiskey Ginger w/ Andrew Santino', 'The Tiny Meat Gang Podcast',
                 'Stuff You Should Know','Patriots Unfiltered','Cal Rivals Excellent Podcast Experience',
                 'Curious with Josh Peck','Locked On Patriots - Daily Podcast On The New England Patriots',
                 'Skotcast with Jeff Wittek & Scotty Sire','Anything Goes with Emma Chamberlain',
                 'Call Her Daddy', 'Office Ladies', 'That Made All the Difference','Pardon My Take', 
                  'My Favorite Theorem', 'The James Altucher Show', 'Zane and Heath: Unfiltered',
                   'With Authority','The Numberphile Podcast', 'Billionaires Getting Interviewed',
                  'Elon Musk Interviews','Cover 3 College Football Podcast']

WHITE_NOISE = ['Nature Sounds', 'Sounds Of Nature : Thunderstorm, Rain','Calmsound']

DIFFICULT_SONGS = {'I Know (feat. Mick Jenkins)': 'I Know Mick Jenkins', 
                   'Take Me Home, Country Roads - Rerecorded': 'Take Me Home Country Roads',
                  'Chica Paranormal - Verdun Remix': 'Chica Paranormal'}

UNSEARCHABLE_SONGS = {'!!!!!!!':'0rQtoQXQfwpDW0c7Fw1NeM'}

SCOPE = "user-library-read"

## Loading Data

Instantiate our `Spotipy` object with the appropriate client ID and secret.

In [None]:
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope, 
    client_id=SPOTIPY_CLIENT_ID, 
    client_secret=SPOTIPY_CLIENT_SECRET,
    redirect_uri=SPOTIPY_REDIRECT_URI))

Read in our raw listening data (in JSON format). The raw data files can be viewed in the `data/personal` folder.

In [None]:
with open('../data/personal/summer20/StreamingHistory0.json') as file:
    data = json.load(file)
df0 = pd.DataFrame(data)

with open('../data/personal/summer20/StreamingHistory1.json') as file:
    data1 = json.load(file)
df1 = pd.DataFrame(data1)

df = df0.append(df1, ignore_index=True)

Now, let's apply some functions to our DataFrame in order to clean up our data. We will do the following:
- Convert the `msPlayed` column into a `secPlayed` column
- Format the `endTime` column to have type `pd.Timestamp`
- Split DataFrame into two: `music` and `podcasts`, each with their respective content

In [None]:
df['secPlayed'] = round(df['msPlayed'] / 1000, 1)
df = df.drop(columns=['msPlayed'])
STRTIME_FORMAT = '%Y-%m-%d %H:%M'
df['endTime'] = pd.to_datetime(df['endTime'], format=STRTIME_FORMAT)

music = df[~df['artistName'].isin(PODCAST_ARTISTS + WHITE_NOISE)].reset_index(drop=True)
podcasts = df[df['artistName'].isin(PODCAST_ARTISTS)].reset_index(drop=True)

## Building Features DataFrame

In [None]:
unique_songs = music.groupby('trackName').count().sort_values('secPlayed', ascending=False).index

column_labels = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 
                 'instrumentalness', 'liveness', 'valence', 'tempo', 'type', 'id', 'uri', 'track_href', 
                 'analysis_url', 'duration_ms', 'time_signature']

In [None]:
features_df = pd.DataFrame(columns=column_labels)

for song in unique_songs:
    uri = get_spotify_uri(song)
    features = sp.audio_features([uri])[0]
    dataframe = pd.DataFrame(data=features, index=[song])
    features_df = features_df.append(dataframe, sort=False)

Now that we have produced our `features_df`, let's save it locally.

In [None]:
features_df.to_csv('../data/tops/features.csv', index=True)