# Taylor Swift Track Dataset - All Songs (6th June 2024)

In this project, we have used Spotify API to extract details about all songs by Taylor Swift (yes, even from TTPD!).


### Key Features:

- Comprehensive Collection: Includes all of Taylor Swift's albums released by June 6, 2024.
- Latest Album: "The Tortured Poets Department: The Anthology" with 31 tracks.
- Raw and Unfiltered: The dataset is presented in its original form without any modifications, ensuring the authenticity of the data.
- Generated with SpotiPy: Data extracted using the SpotiPy library, ensuring accuracy and reliability.

### Step 1: 

Install the packages below.

In [1]:
# !pip install requests
# !pip install Spotipy
# !pip install openpyxl
# !pip install pandas


The below function will wait for 2*x seconds before requesting again.

In [2]:
import time
def retry_on_rate_limit(func):
    def wrapper(*args, **kwargs):
        retries = 5
        for i in range(retries):
            try:
                return func(*args, **kwargs)
            except spotipy.exceptions.SpotifyException as e:
                if e.http_status == 429:
                    print(f"Rate limit exceeded. Retrying in {2 ** i} seconds...")
                    time.sleep(2 ** i)
                else:
                    raise e
        raise Exception("Max retries exceeded")
    return wrapper

### Step 2: Spotify API for developers

Given that you have free/paid account on Spotify, go to [Developer Dashboard](https://developer.spotify.com/dashboard) and create an app. Go to it's settings and copy the values for client ID and client secret as you would need them below.


### Step 3: Getting all albums
After that we have started a search using artist name and then extracted details about her album. Stored it in list called albums_data and converted that list into a dataframe df_albums.

In [3]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

# Spotify API credentials
client_id = 'your_client_ID' 
client_secret = 'your_client_secret' 

# Authentication
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

# Function to get artist's albums
@retry_on_rate_limit
def get_artist_albums(artist_name):
    results = sp.search(q='artist:' + artist_name, type='artist')
    artist = results['artists']['items'][0]
    artist_id = artist['id']
    
    albums = sp.artist_albums(artist_id, album_type='album')
    albums_list = albums['items']
    
    # Collecting album details
    albums_data = []
    for album in albums_list:
        album_info = {
            'album_name': album['name'],
            'album_id': album['id'],
            'album_release_date': album['release_date'],
            'album_total_tracks': album['total_tracks'],
            'album_type': album['album_type'],
            'artist_name': artist['name'],
            'artist_id': artist_id
        }
        albums_data.append(album_info)
    
    return albums_data

# Example usage
artist_name = 'Taylor Swift'
albums_data = get_artist_albums(artist_name)

# Convert to DataFrame for easier handling
df_albums = pd.DataFrame(albums_data)
df_albums


Unnamed: 0,album_name,album_id,album_release_date,album_total_tracks,album_type,artist_name,artist_id
0,THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY,5H7ixXZfsNMGbIE5OBSpcb,2024-04-19,31,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
1,THE TORTURED POETS DEPARTMENT,1Mo4aZ8pdj6L1jx8zSwJnt,2024-04-18,16,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
2,1989 (Taylor's Version) [Deluxe],1o59UpKw81iHR0HPiSkJR0,2023-10-27,22,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
3,1989 (Taylor's Version),64LU4c1nfjz1t4VnGhagcg,2023-10-26,21,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
4,Speak Now (Taylor's Version),5AEDGbliTTfjOB8TSm1sxt,2023-07-07,22,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
5,Midnights (The Til Dawn Edition),1fnJ7k0bllNfL1kVdNVW1A,2023-05-26,23,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
6,Midnights (3am Edition),3lS1y25WAhcqJDATJK70Mq,2022-10-22,20,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
7,Midnights,151w1FgRZfnKZA9FEcg9Z3,2022-10-21,13,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
8,Red (Taylor's Version),6kZ42qRrzov54LcAk4onW9,2021-11-12,30,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02
9,Fearless (Taylor's Version),4hDok0OAJd57SGIT8xuWJH,2021-04-09,26,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02


### Step 4: Getting all songs

Since we have all the songs in list albums_data, we can use it to get tracks from each album using a for loop. We store all the track details in the list tracks_data, then call a function get_album_tracks(album_id) which loops for each album and collects track details to store in the list all_tracks_data. Lastly, convert the list into a dataframe df_tracks.

In [4]:
# Function to get tracks for a given album
@retry_on_rate_limit
def get_album_tracks(album_id):
    tracks = sp.album_tracks(album_id)
    tracks_list = tracks['items']
    
    # Collecting track details
    tracks_data = []
    for track in tracks_list:
        track_info = {
            'track_name': track['name'],
            'track_id': track['id'],
            'duration_ms': track['duration_ms'],
            'explicit': track['explicit'],
            'track_number': track['track_number'],
            'album_id': album_id
        }
        tracks_data.append(track_info)
    
    return tracks_data

# Example usage for extracting tracks from all albums of the artist
all_tracks_data = []
for album in albums_data:
    album_id = album['album_id']
    tracks_data = get_album_tracks(album_id)
    all_tracks_data.extend(tracks_data)

# Convert to DataFrame for easier handling
df_tracks = pd.DataFrame(all_tracks_data)
df_tracks


Unnamed: 0,track_name,track_id,duration_ms,explicit,track_number,album_id
0,Fortnight (feat. Post Malone),6dODwocEuGzHAavXqTbwHv,228965,False,1,5H7ixXZfsNMGbIE5OBSpcb
1,The Tortured Poets Department,4PdLaGZubp4lghChqp8erB,293048,True,2,5H7ixXZfsNMGbIE5OBSpcb
2,My Boy Only Breaks His Favorite Toys,7uGYWMwRy24dm7RUDDhUlD,203801,False,3,5H7ixXZfsNMGbIE5OBSpcb
3,Down Bad,1kbEbBdEgQdQeLXCJh28pJ,261228,True,4,5H7ixXZfsNMGbIE5OBSpcb
4,"So Long, London",7wAkQFShJ27V8362MqevQr,262974,False,5,5H7ixXZfsNMGbIE5OBSpcb
...,...,...,...,...,...,...
429,Wildest Dreams,59HjlYCeBsxdI0fcm3zglw,220440,False,9,2QJmrSgbdM35R67eoGQo4j
430,How You Get The Girl,4dYUOfmWna6DFccnz732n8,247533,False,10,2QJmrSgbdM35R67eoGQo4j
431,This Love,3oKMl2tJv4fdidkXUYMI5x,250093,False,11,2QJmrSgbdM35R67eoGQo4j
432,I Know Places,3jBMHD19RZdAqG9iFQh7xc,195706,False,12,2QJmrSgbdM35R67eoGQo4j


### Step 5: Getting audio features for tracks

Our last step in data extraction part of this project will be extracting audio features for each track. Audio features include metrics like danceability, energy, loudness, speechiness, acousticness etc. Read more here: [Get Several Tracks' Audio Features](https://developer.spotify.com/documentation/web-api/reference/get-several-audio-features)

To get these details, we used the API's audio_features function, store it in list called tracks_analysis, then loop it for all tracks in our previous list of all_tracks_data. Finally we store it in list all_track_analyses, and convert it in dataframe df_track_analysis.

In [5]:
import pandas as pd

# Function to get track audio analysis and display as a DataFrame
@retry_on_rate_limit
def get_track_audio_analysis(track_id):
    audio_features_list = sp.audio_features(track_id)
    
    # Extracting relevant information
    tracks_analysis = []
    for audio_features in audio_features_list:
        track_fea = {
            'track_id': track_id,
            'danceability': audio_features['danceability'],
            'energy': audio_features['energy'],
            'loudness': audio_features['loudness'],
            'speechiness': audio_features['speechiness'],
            'acousticness': audio_features['acousticness'],
            'instrumentalness': audio_features['instrumentalness'],
            'liveness': audio_features['liveness'],
            'valence': audio_features['valence'],
        }
        tracks_analysis.append(track_fea)

    # Return the list of track analyses
    return tracks_analysis

# Example usage
all_track_analyses = []
for track in all_tracks_data:
    track_id = track['track_id']
    track_analysis = get_track_audio_analysis(track_id)
    all_track_analyses.extend(track_analysis)
    
# Convert the list of track analyses to a DataFrame
df_track_analysis = pd.DataFrame(all_track_analyses)

# Display df
df_track_analysis


Unnamed: 0,track_id,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence
0,6dODwocEuGzHAavXqTbwHv,0.504,0.386,-10.976,0.0308,0.50200,0.000015,0.0961,0.2810
1,4PdLaGZubp4lghChqp8erB,0.604,0.428,-8.441,0.0255,0.04830,0.000000,0.1260,0.2920
2,7uGYWMwRy24dm7RUDDhUlD,0.596,0.563,-7.362,0.0269,0.13700,0.000000,0.3020,0.4810
3,1kbEbBdEgQdQeLXCJh28pJ,0.541,0.366,-10.412,0.0748,0.56000,0.000001,0.0946,0.1680
4,7wAkQFShJ27V8362MqevQr,0.423,0.533,-11.388,0.3220,0.73000,0.002640,0.0816,0.2480
...,...,...,...,...,...,...,...,...,...
429,59HjlYCeBsxdI0fcm3zglw,0.554,0.666,-7.414,0.0747,0.07020,0.005930,0.1060,0.4720
430,4dYUOfmWna6DFccnz732n8,0.764,0.660,-6.136,0.0494,0.00461,0.004770,0.0915,0.5240
431,3oKMl2tJv4fdidkXUYMI5x,0.475,0.459,-8.768,0.0333,0.63500,0.000000,0.1010,0.0828
432,3jBMHD19RZdAqG9iFQh7xc,0.596,0.763,-4.990,0.0661,0.23100,0.000000,0.2000,0.5070


### Step 6:  Merge all dataframes and export as CSV

Once all the desired details have been extracted, we have 3 dataframes with our desired information. We will merge these dataframes to get one final dataframe with all desired details. This dataframe will be called new_merge, and we have also exported it as an CSV file.

In [10]:
import openpyxl

# Merge the data frames on 'album_id'
merged_df = pd.merge(df_tracks, df_albums, on='album_id')
new_merge = pd.merge(merged_df, df_track_analysis, on='track_id')

# Save the merged data frame to an Excel file
new_merge.to_csv('taylor_discography.csv', index=False, encoding='utf-8')

print("Data merged and saved to 'taylor_discography.csv'")

Data merged and saved to 'taylor_discography.csv'


In [11]:
new_merge

Unnamed: 0,track_name,track_id,duration_ms,explicit,track_number,album_id,album_name,album_release_date,album_total_tracks,album_type,artist_name,artist_id,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence
0,Fortnight (feat. Post Malone),6dODwocEuGzHAavXqTbwHv,228965,False,1,5H7ixXZfsNMGbIE5OBSpcb,THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY,2024-04-19,31,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.504,0.386,-10.976,0.0308,0.50200,0.000015,0.0961,0.2810
1,The Tortured Poets Department,4PdLaGZubp4lghChqp8erB,293048,True,2,5H7ixXZfsNMGbIE5OBSpcb,THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY,2024-04-19,31,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.604,0.428,-8.441,0.0255,0.04830,0.000000,0.1260,0.2920
2,My Boy Only Breaks His Favorite Toys,7uGYWMwRy24dm7RUDDhUlD,203801,False,3,5H7ixXZfsNMGbIE5OBSpcb,THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY,2024-04-19,31,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.596,0.563,-7.362,0.0269,0.13700,0.000000,0.3020,0.4810
3,Down Bad,1kbEbBdEgQdQeLXCJh28pJ,261228,True,4,5H7ixXZfsNMGbIE5OBSpcb,THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY,2024-04-19,31,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.541,0.366,-10.412,0.0748,0.56000,0.000001,0.0946,0.1680
4,"So Long, London",7wAkQFShJ27V8362MqevQr,262974,False,5,5H7ixXZfsNMGbIE5OBSpcb,THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY,2024-04-19,31,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.423,0.533,-11.388,0.3220,0.73000,0.002640,0.0816,0.2480
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
429,Wildest Dreams,59HjlYCeBsxdI0fcm3zglw,220440,False,9,2QJmrSgbdM35R67eoGQo4j,1989,2014-10-27,13,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.554,0.666,-7.414,0.0747,0.07020,0.005930,0.1060,0.4720
430,How You Get The Girl,4dYUOfmWna6DFccnz732n8,247533,False,10,2QJmrSgbdM35R67eoGQo4j,1989,2014-10-27,13,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.764,0.660,-6.136,0.0494,0.00461,0.004770,0.0915,0.5240
431,This Love,3oKMl2tJv4fdidkXUYMI5x,250093,False,11,2QJmrSgbdM35R67eoGQo4j,1989,2014-10-27,13,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.475,0.459,-8.768,0.0333,0.63500,0.000000,0.1010,0.0828
432,I Know Places,3jBMHD19RZdAqG9iFQh7xc,195706,False,12,2QJmrSgbdM35R67eoGQo4j,1989,2014-10-27,13,album,Taylor Swift,06HL4z0CvFAxyc27GXpf02,0.596,0.763,-4.990,0.0661,0.23100,0.000000,0.2000,0.5070


Done! Here we have Miss Swift's discography, with track audio features! 

### Acknowledgements 
This dataset was created using the SpotiPy library, a Python client for the Spotify Web API, which allows for easy access to Spotify's vast music catalog.