# Predicting Song Genres Using Spotify Data

## Description

This project aims to build a machine learning model that predicts the genre of a song using various metrics provided by Spotify. The goal is to create a predictive model that can  classify the genre of a song based on its features such as danceability, energy, tempo, and other characteristics. Additionally, this project will use the Spotify API to retrieve these song metrics for any new track, allowing us to make predictions on new songs.

### Workflow

1. Collect Data
    
    Build a dataset within Spotify

2. Preprocess Data:

    Clean and preprocess dataset for model training.
3. Train Models:
    


    Train models using the audio metrics as features and genre as target.
    
    Evaluate the model's performance using cross-validation and metrics (accuracy, F1-score).
4. Evaluate Model Performance:

    Check for the effectiveness of the model. Analyze predictios.
5. Integrate Spotify API:
    
6. Make Predictions on New Songs:
    
    Use the trained machine learning model to predict the genre of any new song based on its Spotify audio features.

## Import Libraries

In [24]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.decomposition import PCA

from sklearn.metrics import classification_report

from sklearn.model_selection import GridSearchCV

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.exceptions import SpotifyException

In [None]:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

## Spotify API Setup

In [36]:

client_id = "4e94c7a00ce841cb97a1eb6b94715735"
client_secret = "023e76405fdc4e68af511d30ef91d172"

# Authenticate with Spotify API
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id=client_id, client_secret=client_secret))

# Test
result = sp.search(q='breath away', type='track', limit=1)
print(result)


Max Retries reached


SpotifyException: http status: 429, code:-1 - /v1/audio-features/?ids=1oic0Wedm3XeHxwaxmwO91:
 Max Retries, reason: too many 429 error responses

### Retreive Audio Features

In [35]:
def get_audio_features(track_id):
    # get audio features for a specific track
    features = sp.audio_features([track_id])
    return features[0] 

track_id = result['tracks']['items'][0]['id']
audio_features = get_audio_features(track_id)
print(audio_features)


Max Retries reached


SpotifyException: http status: 429, code:-1 - /v1/audio-features/?ids=1oic0Wedm3XeHxwaxmwO91:
 Max Retries, reason: too many 429 error responses

## Building a Dataset

In [28]:
def search_songs_by_genre(genre, limit=10):
    songs_data = []
    results = sp.search(q=f'genre:{genre}', type='track', limit=limit)
    
    for track in results['tracks']['items']:
        track_id = track['id']
        audio_features = get_audio_features(track_id)
        if audio_features:
            audio_features['genre'] = genre
            songs_data.append(audio_features)
    
    return songs_data

# List of 20 genres
genres = [
    'pop', 'rock', 'jazz', 'classical', 'hip-hop', 'metal', 'reggae', 'blues',
    'country', 'edm', 'latin', 'soul', 'punk', 'folk', 'funk', 'indie', 'disco',
    'r&b', 'gospel', 'alternative'
]

all_songs_data = []

for genre in genres:
    print(f"Collecting songs for genre: {genre}")
    genre_songs = search_songs_by_genre(genre, limit=25)  
    all_songs_data.extend(genre_songs)

df = pd.DataFrame(all_songs_data)

print(df.head())  

Collecting songs for genre: pop


Max Retries reached


SpotifyException: http status: 429, code:-1 - /v1/audio-features/?ids=0WbMK4wrZ1wFSty9F7FCgu:
 Max Retries, reason: too many 429 error responses

## Data Preprocessing

In [None]:
def preprocess_data(df):
    df = df.drop(['id', 'uri', 'track_href', 'analysis_url', 'type'], axis=1)
    
    df = df.dropna()
    
    # Label encode the genre column
    label_encoder = LabelEncoder()
    df['genre'] = label_encoder.fit_transform(df['genre'])
    
    
    X = df.drop(['genre'], axis=1)
    y = df['genre']
    
    # Normalize  feature values
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    return X_scaled, y, label_encoder

X, y, label_encoder = preprocess_data(df)


## Train Machine Learning Model


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

models = {
    "Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
    "Support Vector Machine (SVM)": SVC(kernel='linear'),  # You can also try 'rbf' kernel
    "Gradient Boosting": GradientBoostingClassifier(n_estimators=100, random_state=42)
}

model_performance = {}

for model_name, model in models.items():
    print(f"Training {model_name}...")
    model.fit(X_train, y_train)  
    y_pred = model.predict(X_test)  
    
    # Generate classification report
    print(f"Classification Report for {model_name}:")
    report = classification_report(y_test, y_pred, target_names=label_encoder.classes_)
    print(report)
    
    model_performance[model_name] = report


Training Random Forest...
Classification Report for Random Forest:
                   precision    recall  f1-score   support

          ambient       0.56      0.83      0.67        12
        chillwave       0.22      0.33      0.26        15
       deep house       0.64      0.64      0.64        14
        downtempo       0.20      0.10      0.13        20
    drum and bass       0.53      0.80      0.64        10
          dubstep       0.56      0.53      0.55        17
          electro       0.13      0.10      0.11        20
       electronic       0.00      0.00      0.00        19
      future bass       0.50      0.12      0.19        17
       glitch hop       0.33      0.12      0.18        16
         hardcore       0.31      0.31      0.31        16
            house       0.11      0.13      0.12        15
              idm       0.56      0.45      0.50        11
progressive house       0.40      0.36      0.38        11
        synthwave       0.44      0.61      0.5

In [None]:
# Random Forest
model_rf = RandomForestClassifier(n_estimators=100, random_state=42)
model_rf.fit(X_train, y_train)
y_pred_rf = model_rf.predict(X_test)
print("Random Forest Classification Report:")
print(classification_report(y_test, y_pred_rf, target_names=label_encoder.classes_))

# Gradient Boosting
model_gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
model_gb.fit(X_train, y_train)
y_pred_gb = model_gb.predict(X_test)
print("Gradient Boosting Classification Report:")
print(classification_report(y_test, y_pred_gb, target_names=label_encoder.classes_))


Random Forest Classification Report:
                   precision    recall  f1-score   support

          ambient       0.56      0.83      0.67        12
        chillwave       0.22      0.33      0.26        15
       deep house       0.64      0.64      0.64        14
        downtempo       0.20      0.10      0.13        20
    drum and bass       0.53      0.80      0.64        10
          dubstep       0.56      0.53      0.55        17
          electro       0.13      0.10      0.11        20
       electronic       0.00      0.00      0.00        19
      future bass       0.50      0.12      0.19        17
       glitch hop       0.33      0.12      0.18        16
         hardcore       0.31      0.31      0.31        16
            house       0.11      0.13      0.12        15
              idm       0.56      0.45      0.50        11
progressive house       0.40      0.36      0.38        11
        synthwave       0.44      0.61      0.51        18
       tech house 

In [None]:
pca = PCA(n_components=10)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

model_svm_pca = SVC(kernel='linear')
model_svm_pca.fit(X_train_pca, y_train)
y_pred_svm_pca = model_svm_pca.predict(X_test_pca)

print("SVM with PCA Classification Report:")
print(classification_report(y_test, y_pred_svm_pca, target_names=label_encoder.classes_))


SVM with PCA Classification Report:
                   precision    recall  f1-score   support

          ambient       0.40      0.67      0.50        12
        chillwave       0.00      0.00      0.00        15
       deep house       0.36      0.29      0.32        14
        downtempo       0.22      0.10      0.14        20
    drum and bass       0.39      0.70      0.50        10
          dubstep       0.20      0.24      0.22        17
          electro       0.15      0.20      0.17        20
       electronic       0.00      0.00      0.00        19
      future bass       0.00      0.00      0.00        17
       glitch hop       0.00      0.00      0.00        16
         hardcore       0.25      0.25      0.25        16
            house       0.12      0.13      0.12        15
              idm       0.23      0.45      0.30        11
progressive house       0.14      0.18      0.16        11
        synthwave       0.41      0.50      0.45        18
       tech house  

In [None]:
# Random Forest with class weights
model_rf_weighted = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
model_rf_weighted.fit(X_train, y_train)
y_pred_rf_weighted = model_rf_weighted.predict(X_test)

print("Random Forest with Class Weights Classification Report:")
print(classification_report(y_test, y_pred_rf_weighted, target_names=label_encoder.classes_))


Random Forest with Class Weights Classification Report:
                   precision    recall  f1-score   support

          ambient       0.55      0.92      0.69        12
        chillwave       0.27      0.40      0.32        15
       deep house       0.69      0.64      0.67        14
        downtempo       0.18      0.10      0.13        20
    drum and bass       0.50      0.70      0.58        10
          dubstep       0.44      0.41      0.42        17
          electro       0.14      0.10      0.12        20
       electronic       0.17      0.05      0.08        19
      future bass       0.50      0.12      0.19        17
       glitch hop       0.38      0.19      0.25        16
         hardcore       0.33      0.25      0.29        16
            house       0.12      0.13      0.12        15
              idm       0.58      0.64      0.61        11
progressive house       0.22      0.18      0.20        11
        synthwave       0.36      0.44      0.40        18

# Conclusions