## Chapter 5) Methodology
Code for analysis and ML workflow to build a recommender system and find music item similarities based on a user's listening history and liked/saved songs.


##### Steps (Brief Overview):

**Data Collection & Processing (EDA Part)**
1. load all data available and store in Pandas data frames
2. connect to Spotify API using developer console to extract song features
3. create separate DFs for songs/playlists collected by friends and MBTI playlists downloaded from Kaggle

**Feature Extraction & Selection**
4. clean the data and select feature columns for the model

**Content Based Filtering on Base Dataset**
5. applying the different ML Models on Baseline Dataset Using Content Based Filtering and Evaluating Initial Results

**Incorperating MBTI Perosnality Types in the Recommendation Process**
6. adding additional feature column for MBTI personality type and create MBTI based DFs from Kaggle Datasets
7. applying use-item Matrix factorization and evaluate results 

**Compare Results**
8. compare the results of baseline model with MBTI implemented model

#### Data Collection & Processing (EDA Part)

In [None]:
import pandas as pd
import seaborn as sns

# saving obtained playlists from friends and Kaggle datasets as Pandas DFs (your library = liked songs)


#### Test

In [None]:
# example that worked for 1 DF

import json
import pandas as pd

# JSON file path of user
json_file_path_wadthy = '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/00_YourLibrary_wadthy.json'

# Load the JSON file
with open(json_file_path_wadthy, 'r') as file:
    data = json.load(file)

# Extract song information
songs = data['tracks']

# Create a list to hold song data
song_list = []

# Iterate over each song and extract relevant details
for song in songs:
    song_info = {
        'Title': song.get('track', 'N/A'),
        'Artist': song.get('artist', 'N/A'),
        'Album': song.get('album', 'N/A'),
        'URI': song.get('uri', 'N/A')
    }
    song_list.append(song_info)

# Create a DataFrame from the song list
df = pd.DataFrame(song_list)

# Display the DataFrame
df.head(10)

#### Data Collection & Processing (EDA Part) -> Extraction of all JSON File Data

In [4]:
import json
import pandas as pd
import os

# List of JSON file paths --> local file path
file_paths = [
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/00_YourLibrary_wadthy.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/01_YourLibrary_withy.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/02_YourLibrary_yoojin.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/03_YourLibrary_moni.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/04_YourLibrary_nga.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/05_YourLibrary_makra.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/06_YourLibrary_soeren.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/07_YourLibrary_simon.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/09_YourLibrary_yeonju.json',
    '/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Collected Spotify Data from Friends/All Extracted Library/10_YourLibrary_han.json'
]

# Dictionary to store DataFrames
dataframes = {}

# Loop through each file, load the data, and create a DataFrame
for file_path in file_paths:
    with open(file_path, 'r') as file:
        data = json.load(file)

    # Extract song information
    songs = data['tracks']

    # Create a list to hold song data
    song_list = []

    # Iterate over each song and extract relevant details
    for song in songs:
        song_info = {
            'Title': song.get('track', 'N/A'),
            'Artist': song.get('artist', 'N/A'),
            'Album': song.get('album', 'N/A'),
            'URI': song.get('uri', 'N/A')
        }
        song_list.append(song_info)

    # Create a DataFrame from the song list
    df = pd.DataFrame(song_list)

    # Use the file name (without extension) as the key for the DataFrame in the dictionary
    file_name = os.path.splitext(os.path.basename(file_path))[0]
    dataframes[file_name] = df

# Accessing and analyzing individual DataFrames
for name, df in dataframes.items():
    print(f"DataFrame for {name}:")
    print(df.head())  # Display the first few rows of the DataFrame

    # Example analysis: Print the number of songs in each DataFrame
    print(f"Number of songs in {name}: {len(df)}")

    # Optionally, save each DataFrame to a separate CSV file
    df.to_csv(f'{name}_songs.csv', index=False)

# Example of specific DataFrame access for further analysis
# Access a specific DataFrame by its key, e.g., "05_YourLibrary_makra" --> getting DFs of all collected Users
df_wadthy = dataframes.get("00_YourLibrary_wadthy") # 3157 entries
df_withy = dataframes.get("01_YourLibrary_withy") # 601
df_yoojin = dataframes.get("02_YourLibrary_yoojin") # 46
df_moni = dataframes.get("03_YourLibrary_moni") # 13
df_nga = dataframes.get("04_YourLibrary_nga") # 1152
df_makra = dataframes.get("05_YourLibrary_makra") # 12
df_soeren = dataframes.get("06_YourLibrary_soeren") # 1086
df_simon = dataframes.get("07_YourLibrary_simon") # 681
# van's daten
df_yeonju = dataframes.get("09_YourLibrary_yeonju") # 260
df_han = dataframes.get("10_YourLibrary_han") # 106
# trang's daten


DataFrame for 00_YourLibrary_wadthy:
                     Title       Artist              Album  \
0  Smells Like Teen Spirit      Nirvana            Nirvana   
1               Sure Thing       Miguel  All I Want Is You   
2              Fancy Shoes  The Walters     Songs for Dads   
3                Tokyo Inn       HYUKOH                 23   
4             Konoha Peace         Kato       Naruto Vibes   

                                    URI  
0  spotify:track:4hy4fb5D1KL50b3sng9cjw  
1  spotify:track:0JXXNGljqupsJaZsgSbMZV  
2  spotify:track:1YVVAiBD5WhX2ZdHtlSOhz  
3  spotify:track:4myeBw35GUMw5FyDGZcOON  
4  spotify:track:0wIfYaveiZku0eL44UXtHk  
Number of songs in 00_YourLibrary_wadthy: 3157
DataFrame for 01_YourLibrary_withy:
                                       Title           Artist          Album  \
0  Ordinaryish People (feat. Blue Man Group)              AJR   OK ORCHESTRA   
1                                   Good Day       Jake Scott       Lavender   
2              

In [35]:
df_simon.head(20)

Unnamed: 0,Title,Artist,Album,URI
0,Growing Up (feat. Ed Sheeran),Macklemore & Ryan Lewis,This Unruly Mess I've Made,spotify:track:44T13PWJ87jb3lFElhVIHx
1,Back for Good - Radio Mix,Take That,Nobody Else (Deluxe),spotify:track:24fQpRwKFkC3Fe8QtvvrNw
2,The Greatest,Sia,This Is Acting (Deluxe Version),spotify:track:6bLopGnirdrilrpdVB6Um1
3,Vendetta,Chakuza,ersguterjunge Sampler Vol.2 - Vendetta - Rerel...,spotify:track:47jBE0e53JfnUMqoqWjT2d
4,See Her Out (Thats Just Life),Francis and the Lights,"Farewell, Starlite!",spotify:track:5zheSFviZNgeZLvZCOxQnE
5,Legions From The East,Kiani & His Legion,Legions From The East EP,spotify:track:0WDKPuQjyNLJq3Inhy0oVh
6,Private Practice,Nick Monaco,Mating Call,spotify:track:6EhMe5eUJcEOPkiN308Wr2
7,Die schönsten Tage,SDP,Ein Gutes Schlechtes Vorbild,spotify:track:2ERn9fVsix3AyuVv7UN6RC
8,Dancing In The Moonlight,L'aupaire,Reframing,spotify:track:6TlgF30oyNUQqqva3u6CR5
9,A New Error,Moderat,Moderat,spotify:track:6OGRM4MAOlyOdhHuX0OJ6P


# Sampling only 300 rows from data sets

In [10]:
# getting random sample of 300 entries for each of the MBTI DFs to reduce the chance of hitting the Spotify API rate limit
 
df_wadthy_sampled = df_wadthy.sample(n=50, random_state=42) # random_state=42 to make the DF reproducible
df_withy_sampled = df_withy.sample(n=50, random_state=42)
# df_yoojin_sampled = df_yoojin.sample(n=min(300, len(df)), random_state=42) -- has already less than 300 entries
# df_moni_sampled = df_moni.sample(n=min(300, len(df)), random_state=42)
df_nga_sampled = df_nga.sample(n=50, random_state=42)
#df_makra_sampled = df_makra.sample(n=300, random_state=42)
df_soeren_sampled = df_soeren.sample(n=50, random_state=42)
df_simon_sampled = df_simon.sample(n=50, random_state=42)
df_yeonju_sampled = df_yeonju.sample(n=50, random_state=42)
df_han_sampled = df_han.sample(n=50, random_state=42)

In [6]:
df_wadthy_sampled.shape

(50, 4)

In [32]:
df_wadthy_sampled.to_csv('wadthy_sampled_songs.csv', index=False)

In [11]:
df_han_uri = df_han[['URI']]

In [12]:
df_han_uri.head(10)
# df_withy_uri.shape

Unnamed: 0,URI
0,spotify:track:0WNjYlfwunIeIdq3HMPkml
1,spotify:track:72FIKI93KFHUsDMDmzn3EN
2,spotify:track:0h2vlb6QhN9kAve9dWHImr
3,spotify:track:0HFkJfrrQLKeU5ciGPFtZk
4,spotify:track:5ud7CJJQP0TAMFFtqiVZuG
5,spotify:track:66eVkeUvwJmDGZoT5Hd895
6,spotify:track:3zK4TEs7NDdZ7EE4LyZ4iS
7,spotify:track:2eiG0ePCKT8XWDZ4dip1pT
8,spotify:track:3rUGC1vUpkDG9CZFHMur1t
9,spotify:track:29MOOOmMRPeWyPwKQFoF5t


## Test

In [None]:
# generate access token from Spotify client_id and client_secret

import requests
import base64

CLIENT_ID = '582341de1c87493291783ae774754039'
CLIENT_SECRET = 'a21c97ce2062459a8f257d476ed5fe97'

# Encode client ID and secret
auth_header = base64.b64encode(f"{CLIENT_ID}:{CLIENT_SECRET}".encode()).decode()

# Set up the request headers and body
headers = {
    'Authorization': f'Basic {auth_header}',
    'Content-Type': 'application/x-www-form-urlencoded'
}
data = {'grant_type': 'client_credentials'}

# Make the POST request to the token endpoint
response = requests.post('https://accounts.spotify.com/api/token', headers=headers, data=data)

# Extract the access token from the response
if response.status_code == 200:
    access_token = response.json()['access_token']
    print(f"Access Token: {access_token}")
else:
    print(f"Error: {response.status_code}, {response.text}")
    
# Access Token: BQAtK62oWTzOVb5Wxdya8b0p8GKn-CjElXu8S9UFTOfhTnxIwZwM--5nm4qGVhbCcg8O9IM4kqM7oCZXGpwiaeE3Ftf4xtm9oLygQLV3teQIL9M_wKo

In [None]:
# Assuming your DataFrame is named 'df'
uris = df_wadthy_sampled['URI'].tolist()

In [None]:
def chunk_list(lst, chunk_size=50):
    return [lst[i:i + chunk_size] for i in range(0, len(lst), chunk_size)]

uri_chunks = chunk_list(uris)

In [None]:
import requests
import time

def get_tracks_info(uri_chunk, access_token):
    ids = [uri.split(':')[-1] for uri in uri_chunk]  # Extract track IDs from URIs
    url = 'https://api.spotify.com/v1/tracks'
    headers = {
        'Authorization': f'Bearer {access_token}'
    }
    params = {
        'ids': ','.join(ids)
    }

    response = requests.get(url, headers=headers, params=params)

    if response.status_code == 200:
        return response.json()['tracks']
    else:
        print(f"Error: {response.status_code}")
        return None

In [None]:
all_tracks_info = []
access_token = 'BQAtK62oWTzOVb5Wxdya8b0p8GKn-CjElXu8S9UFTOfhTnxIwZwM--5nm4qGVhbCcg8O9IM4kqM7oCZXGpwiaeE3Ftf4xtm9oLygQLV3teQIL9M_wKo'  # Replace with your actual access token

for chunk in uri_chunks:
    tracks_info = get_tracks_info(chunk, access_token)
    if tracks_info:
        all_tracks_info.extend(tracks_info)
    time.sleep(1)  # Add a 1-second delay between requests to avoid rate limiting

In [None]:
tracks_df = pd.DataFrame(all_tracks_info)

In [None]:
tracks_df.head(20)

## Test 2 (latest)

In [14]:
## another test ....

import json
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import time

# Load your DataFrame
df = df_han_uri

# Spotify API credentials
client_id = '582341de1c87493291783ae774754039'
client_secret = 'a21c97ce2062459a8f257d476ed5fe97'


# Authenticate with Spotify
credentials = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=credentials)

# Function to fetch track details
def get_track_details(uri):
    try:
        track_info = sp.track(uri)
        details = {
            'Track Name': track_info['name'],
            'Artist': track_info['artists'][0]['name'],
            'Album': track_info['album']['name'],
            'Popularity': track_info['popularity']
        }
        return details
    except Exception as e:
        print(f"Error fetching data for URI {uri}: {e}")
        return None

# Function to fetch audio features for a batch of URIs
def get_audio_features(uris):
    max_retries = 5
    retries = 0
    while retries < max_retries:
        try:
            audio_features = sp.audio_features(uris)
            features_list = []
            for features in audio_features:
                if features:
                    features_list.append({
                        'Danceability': features['danceability'],
                        'Energy': features['energy'],
                        'Key': features['key'],
                        'Loudness': features['loudness'],
                        'Speechiness': features['speechiness'],
                        'Acousticness': features['acousticness'],
                        'Instrumentalness': features['instrumentalness'],
                        'Liveness': features['liveness'],
                        'Valence': features['valence'],
                        'Tempo': features['tempo'],
                        'Duration_ms': features['duration_ms']
                    })
                else:
                    features_list.append({key: None for key in [
                        'Danceability', 'Energy', 'Key', 'Loudness', 'Speechiness', 'Acousticness',
                        'Instrumentalness', 'Liveness', 'Valence', 'Tempo', 'Duration_ms'
                    ]})
            return features_list
        except spotipy.exceptions.SpotifyException as e:
            if e.http_status == 429:
                retries += 1
                retry_after = int(e.headers.get('Retry-After', 10))  # Default to 10 seconds if not specified
                print(f"Rate limit hit. Retrying after {retry_after} seconds...")
                time.sleep(retry_after)
            else:
                print(f"Error fetching audio features: {e}")
                return [{key: None for key in [
                    'Danceability', 'Energy', 'Key', 'Loudness', 'Speechiness', 'Acousticness',
                    'Instrumentalness', 'Liveness', 'Valence', 'Tempo', 'Duration_ms'
                ]}] * len(uris)
        except Exception as e:
            print(f"Unexpected error: {e}")
            return [{key: None for key in [
                'Danceability', 'Energy', 'Key', 'Loudness', 'Speechiness', 'Acousticness',
                'Instrumentalness', 'Liveness', 'Valence', 'Tempo', 'Duration_ms'
            ]}] * len(uris)
    print("Max retries reached. Skipping batch.")
    return [{key: None for key in [
        'Danceability', 'Energy', 'Key', 'Loudness', 'Speechiness', 'Acousticness',
        'Instrumentalness', 'Liveness', 'Valence', 'Tempo', 'Duration_ms'
    ]}] * len(uris)

# Fetch additional data for each track and add to DataFrame
additional_data = []
batch_size = 100  # Spotify's maximum batch size for audio features (100)
uris = df['URI'].tolist()

for i in range(0, len(uris), batch_size):
    batch_uris = uris[i:i + batch_size]

    # Fetch track details individually
    for uri in batch_uris:
        track_details = get_track_details(uri)
        if track_details:
            additional_data.append(track_details)
        else:
            additional_data.append({
                'Track Name': None, 'Artist': None, 'Album': None, 'Popularity': None
            })

    # Fetch audio features in batches
    features_list = get_audio_features(batch_uris)

    # Add features to the corresponding track details
    for j, features in enumerate(features_list):
        additional_data[i + j].update(features)

    # Respect rate limits by sleeping between batches
    time.sleep(1)  # Adjust sleep time as needed

# Convert additional data to DataFrame and concatenate with original DataFrame
additional_df = pd.DataFrame(additional_data)
df = pd.concat([df, additional_df], axis=1)

# Save the extended DataFrame
df.to_csv('extended_songs.csv', index=False)
print(df.head())

Max Retries reached


Rate limit hit. Retrying after 10 seconds...


KeyboardInterrupt: 

In [8]:
df.head(20)


Unnamed: 0,URI,Track Name,Artist,Album,Popularity,Danceability,Energy,Key,Loudness,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Duration_ms
3052,spotify:track:7mGk4gOciSMNc4W1S5W5dy,,,,,,,,,,,,,,,
2971,spotify:track:6t6L24yPjZ1AWsVP3JNLak,,,,,,,,,,,,,,,
2960,spotify:track:0diy2V4KwFtRtt6TuqGTM7,,,,,,,,,,,,,,,
2334,spotify:track:0pYacDCZuRhcrwGUA5nTBe,,,,,,,,,,,,,,,
139,spotify:track:6u7jPi22kF8CTQ3rb9DHE7,,,,,,,,,,,,,,,
3074,spotify:track:1s9i7W8zx7Nxx78MUIsvjV,,,,,,,,,,,,,,,
2808,spotify:track:1HYzRuWjmS9LXCkdVHi25K,,,,,,,,,,,,,,,
2383,spotify:track:2KG2oscRsHPPV0zTQoqgiu,,,,,,,,,,,,,,,
1411,spotify:track:0vlKKLk8nyZdXD9W075QDa,,,,,,,,,,,,,,,
1421,spotify:track:7rDQ39HijkAZa4sswxQFcG,,,,,,,,,,,,,,,


In [None]:
df.to_csv('extended_songs.csv', index=False)

In [43]:
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import time
from tenacity import retry, wait_exponential, stop_after_attempt

# Initialize Spotipy
client_credentials_manager = SpotifyClientCredentials(client_id='582341de1c87493291783ae774754039', client_secret='a21c97ce2062459a8f257d476ed5fe97')
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

# Retry settings
@retry(wait=wait_exponential(multiplier=1, min=4, max=10), stop=stop_after_attempt(10))
def get_audio_features_batch(uris):
    return sp.audio_features(uris)

# Load the DataFrame from the CSV file
df = df_wadthy_sampled

# List to hold audio features
audio_features_list = []

# Fetch data in batches
batch_size = 50  # Batch size for audio features requests
for i in range(0, len(df['URI']), batch_size):
    batch_uris = df['URI'][i:i + batch_size]
    try:
        # Fetch audio features for the batch
        audio_features_batch = get_audio_features_batch(batch_uris)

        # Check if audio features were fetched successfully
        if audio_features_batch:
            audio_features_list.extend(audio_features_batch)
        else:
            # If fetching audio features fails, append None for each URI in the batch
            audio_features_list.extend([None] * len(batch_uris))

        print(f"Fetched audio features for batch {i // batch_size + 1}")

    except Exception as e:
        print(f"Failed to fetch data for batch starting with {batch_uris[0]}: {e}")
        audio_features_list.extend([None] * len(batch_uris))

    # Sleep to avoid hitting rate limits
    time.sleep(1)  # Adjust sleep time as needed

# Add fetched audio features to the dataframe
df['audio_features'] = audio_features_list

# Save to CSV
# df.to_csv('wadthy_sampled_songs_with_audio_features.csv', index=False)

# Print completion message
print("Audio features fetched and added to DataFrame successfully.")


Max Retries reached
Max Retries reached


KeyboardInterrupt: 

In [39]:
df.head(20)

Unnamed: 0,Title,Artist,Album,URI,audio_features
0,Too Much,DOLLA $LICE,Too Much,spotify:track:3rAsEf7NeRAL4Dn3Z4yNtb,"{'danceability': 0.918, 'energy': 0.599, 'key'..."
1,Let Me Love You,DOLLA $LICE,Let Me Love You,spotify:track:7DoCpSWH2qK5eblDQA0YKx,"{'danceability': 0.738, 'energy': 0.768, 'key'..."
2,Vault,Ty Luminosity,Vault,spotify:track:1bESEVLHPx4g8UzWcgdzJf,"{'danceability': 0.691, 'energy': 0.322, 'key'..."
3,YUH,Ty Luminosity,YUH,spotify:track:6RVklIYiEN5IGTVBDa7OzZ,"{'danceability': 0.848, 'energy': 0.39, 'key':..."
4,Candles,Zak Downtown,Candles,spotify:track:6a1mSSxp5E3g2HRKquPmAg,"{'danceability': 0.701, 'energy': 0.701, 'key'..."
5,Pact,TK MAC,Pact,spotify:track:60RHuFk5rqOejhu7yaRVnX,"{'danceability': 0.763, 'energy': 0.561, 'key'..."
6,Kapitel 2 - Wie man Freunde gewinnt - Die Kuns...,Dale Carnegie,Wie man Freunde gewinnt,spotify:track:5sPN9IlDkFTynvS2jwq2M6,"{'danceability': 0.729, 'energy': 0.445, 'key'..."
7,Light Speed,MarlinBeats,Light Speed,spotify:track:7pQ06A1Mzx8rT2w4FO7tm5,"{'danceability': 0.662, 'energy': 0.376, 'key'..."
8,Clarity,Kurt Hugo Schneider,Kurt & Company Vol 4,spotify:track:2xvR2cUI5krvEqYcdQYoc4,"{'danceability': 0.558, 'energy': 0.46, 'key':..."
9,Losin Control - DOLLA $LICE Remix,Zak Downtown,Losin Control,spotify:track:308xlrid086raOlnE908hI,"{'danceability': 0.751, 'energy': 0.889, 'key'..."


#### Spotify API Call & Song Feature Extraction
1. connect to Spotify API and extract song features for songs in pandas DFs of friends (collected data sets)
2. load data sets from Kaggle (MBTI Playlists)
3. create overview and relationship visualization of extracted data from Kaggle data

In [None]:
df.head(20)

In [None]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

In [23]:
# Spotify credentials
client_id = '582341de1c87493291783ae774754039'
client_secret = 'a21c97ce2062459a8f257d476ed5fe97'

# Authenticate with Spotify
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

# List of track IDs (replace these with the actual Spotify track IDs you're interested in)
track_ids = ['1LeItUMezKA1HdCHxYICed']

# Initialize empty list to hold track details and audio features
tracks_data = []

# Retrieve track details and audio features for each track
for track_id in track_ids:
    track_details = sp.track(track_id)
    audio_features = sp.audio_features(track_id)[0]
    track_info = sp.track(track_id)

track_info = {
    'title': track_details.get('name'),
    'artist_name': track_details.get('artists', [{}])[0].get('name'),
    'release_date': track_details.get('album', {}).get('release_date'),
    'genre': None,  # Spotify API does not provide genre at the track level
    'duration_ms': track_details.get('duration_ms'),
    'danceability': audio_features.get('danceability'),
    'energy': audio_features.get('energy'),
    'key': audio_features.get('key'),
    'loudness': audio_features.get('loudness'),
    'mode': audio_features.get('mode'),
    'speechiness': audio_features.get('speechiness'),
    'acousticness': audio_features.get('acousticness'),
    'instrumentalness': audio_features.get('instrumentalness'),
    'liveness': audio_features.get('liveness'),
    'valence': audio_features.get('valence'),
    'tempo': audio_features.get('tempo'),
    'time_signature': audio_features.get('time_signature'),
    'popularity': track_info.get('popularity'),
    'explicit': track_info.get('explicit'),
    'preview_url': track_info.get('preview_url')
}

tracks_data.append(track_info)

# Convert list of dicts to pandas DataFrame
df_tracks = pd.DataFrame(tracks_data)

# Display the DataFrame
print(df_tracks)


Max Retries reached


SpotifyException: http status: 429, code:-1 - /v1/audio-features/?ids=1LeItUMezKA1HdCHxYICed:
 Max Retries, reason: too many 429 error responses

In [None]:
df_tracks.head(9)

In [None]:
# old
   track_info = {
    'title': track_details['name'],
    'artist_name': track_details['artists'][0]['name'],
    'release_date': track_details['album']['release_date'],
    'genre': '',  # Spotify API does not provide genre at the track level, usually available at the artist level
    'duration_ms': track_details['duration_ms'],
    'danceability': audio_features['danceability'],
    'energy': audio_features['energy'],
    'key': audio_features['key'],
    'loudness': audio_features['loudness'],
    'mode': audio_features['mode'],
    'speechiness': audio_features['speechiness'],
    'acousticness': audio_features['acousticness'],
    'instrumentalness': audio_features['instrumentalness'],
    'liveness': audio_features['liveness'],
    'valence': audio_features['valence'],
    'tempo': audio_features['tempo'],
    'duration_ms': audio_features['duration_ms'],
    'time_signature': audio_features['time_signature']

}

In [None]:
df_tracks.head()

### Retrieving the Data from Spotify as JSON 

# Saving MBTI Playlists as Pandas Data Frames

In [24]:
# importing MBTI playlists as DFs

# Sentinels
df_ISTJ = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ISTJ_df.csv')
df_ISFJ = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ISFJ_df.csv')
df_ESTJ = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ESTJ_df.csv')
df_ESFJ = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ESFJ_df.csv')

# Explorers
df_ISTP = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ISTP_df.csv')
df_ISFP = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ISFP_df.csv')
df_ESTP = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ESTP_df.csv')
df_ESFP = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ESFP_df.csv')

# Analysts
df_INTJ = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/INTJ_df.csv')
df_INTP = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/INTP_df.csv')
df_ENTJ = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ENTJ_df.csv')
df_ENTP = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ENTP_df.csv')

# Diplomats
df_INFJ = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/INFJ_df.csv')
df_INFP = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/INFP_df.csv')
df_ENFJ = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ENFJ_df.csv')
df_ENFP = pd.read_csv('/Users/khieuvon/Documents/10_Personal Stuff/01_Masterarbeit/Data for ML Model/Spotify MBTI Playlists/archive/ENFP_df.csv')


In [25]:
df_ENFJ.head(9)

Unnamed: 0,mbti,function_pair,playlist_name,playlist_id,track_count,danceability_mean,danceability_stdev,energy_mean,energy_stdev,loudness_mean,...,G#/Abminor_count,G#/AbMajor_count,Aminor_count,AMajor_count,A#/BbMajor_count,BMajor_count,Fminor_count,F#/GbMajor_count,A#/Bbminor_count,Bminor_count
0,ENFJ,NF,Enfj 😇,4AE4DBt4YjJJ8v4Hk9myWl,41,0.580512,0.124325,0.595732,0.222704,-7.736634,...,1.0,1.0,1.0,1.0,2.0,1.0,0.0,0.0,0.0,0.0
1,ENFJ,NF,enfj // the protagonist,6th4JW4Dky7UjeEhHuQeQd,38,0.550947,0.156753,0.679018,0.193395,-7.649368,...,0.0,1.0,1.0,6.0,3.0,1.0,2.0,1.0,0.0,0.0
2,ENFJ,NF,enfj songs according to personality database,4g2AYw35pR37u2QpwpSryq,50,0.56468,0.116814,0.60764,0.204799,-7.75768,...,0.0,1.0,2.0,6.0,2.0,1.0,2.0,1.0,1.0,3.0
3,ENFJ,NF,ENFJ vibes‼️,1eVgLeDoHD123LB6VldjGY,50,0.5352,0.122586,0.59896,0.177415,-7.00594,...,1.0,4.0,6.0,2.0,4.0,2.0,0.0,2.0,2.0,3.0
4,ENFJ,NF,[ENFJ],49BWXniDb2gtfI1e4Z3ILr,50,0.5267,0.123802,0.80894,0.103657,-5.1835,...,0.0,6.0,0.0,2.0,5.0,1.0,0.0,0.0,1.0,6.0
5,ENFJ,NF,ENFJ anthems 🦋,6IDELtu3l7994CPmsyhlUN,50,0.5864,0.164821,0.664,0.178841,-6.81156,...,2.0,2.0,2.0,9.0,1.0,1.0,2.0,0.0,1.0,1.0
6,ENFJ,NF,ENFJ main character moment,3U8PZDhXaOVdeqXp9xfjIs,50,0.64346,0.145858,0.5783,0.161094,-6.9361,...,1.0,0.0,0.0,4.0,2.0,1.0,3.0,2.0,3.0,3.0
7,ENFJ,NF,ENFJ (PDB),7yXX8iWVhHPeZ6CPXkdvRg,50,0.59738,0.110021,0.6571,0.152134,-6.05944,...,0.0,6.0,3.0,2.0,1.0,2.0,0.0,0.0,0.0,3.0
8,ENFJ,NF,enfj,237uKEMCwMaeqdHTaVdNAH,22,0.572864,0.117522,0.583364,0.142982,-7.249773,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,1.0


# Spotify Dataset 

In [None]:
combined_df.head(20)