__<span style="font-size: 40px;">Capstone Project</span>__ 

__<span style="font-size: 40px;">Auraly</span>__ 


__<font size="6">1. Business Understanding</font>__   

***<span style="font-size: 24px;">1.1 Overview</span>***

Music consumption today is highly personalized, with streaming platforms offering tailored recommendations based on listening habits. However, when it comes to emotional resonance, listeners still spend significant time manually curating playlists that match how they feel in the moment.
The traditional approach of browsing by genre or artist fail to capture the subtle emotional layers that make a song resonate. This project will make discovering music more personalized and enjoyable for casual listeners, DJs, and streaming 
platform users.  


***<span style="font-size: 24px;">1.2 Stakeholder</span>*** 

1. *Music Listeners* – benefit from effortless mood based playlist creation and more emotionally resonant music discovery.
2. *Streaming Platforms* – gain deeper user engagement and personalization features that differentiate their service.
3. *DJs & Curators* – save time curating emotionally aligned sets for events or audiences.


***<span style="font-size: 24px;">1.3 Problem Statement</span>*** 

Even though music apps offer personalized recommendations, they still don’t understand how a listener feels. People often spend too much time searching for songs that match their mood because most platforms sort music by genre or artist, not emotion. This makes it hard to find the right songs quickly, and limits how personal and meaningful the listening experience can be.


***<span style="font-size: 24px;">1.4 Business Objective</span>*** 

To establish Auraly as an intelligent mood-based music classification system that enhances emotional connection and personalization in music streaming. By automating playlist creation through acoustic mood detection, Auraly aims to improve user engagement, simplify music curation, and unlock deeper, mood-driven discovery experiences across platforms.


***<span style="font-size: 24px;">1.5 Project Objectives</span>*** 

**Main Objective** 

To develop an intelligent music classification system that automatically identifies the emotional mood of songs using acoustic features, enabling more intuitive and personalized music experiences.

**Specific Objectives** 

1. *Enable automated mood based playlist generation* - Reduce manual curation time by dynamically grouping songs based on emotional tone.
2. *Support personalized music discovery* - Recommend songs that align with a listener’s current mood or emotional preferences.
3. *Enhance user engagement across music platforms* - Improve retention and satisfaction by offering emotionally resonant listening experiences tailored to individual users.


***<span style="font-size: 24px;">1.6 Research Questions</span>*** 

1. How accurately can acoustic features be used to classify the emotional mood of a song?
2. Does personalized mood-based music discovery lead to higher user engagement on streaming platforms?
3. How can mood based classification improve the way users discover and organize music?


***<span style="font-size: 24px;">1.7 Success Criteria</span>***

1. *Accurate Mood Classification* - The system achieves a high accuracy rate in classifying songs into predefined emotional categories based on acoustic features.
2. *Improved User Experience* - Users will report reduced time and effort in creating mood based playlists and express higher satisfaction with music recommendations through surveys or usability testing.
3. *Increased Engagement Metrics* - Streaming platforms or test environments will show measurable improvements in user engagement, e.g. longer listening sessions, more playlist saves, or higher interaction rates, when Auraly is integrated.

---

__<font size="6">2. Data Understanding</font>__ 

---

***<span style="font-size: 24px;">2.1 Importing Relevant Libraries</span>***

In [11]:
# importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import zipfile
import re
from collections import Counter
import warnings
import contractions
from sklearn.model_selection import train_test_split, GridSearchCV
from imblearn.over_sampling import SMOTE
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.metrics import ConfusionMatrixDisplay
warnings.filterwarnings('ignore')

***<span style="font-size: 24px;">2.2 Loading the Data</span>***

In [13]:
# Extracting and load the CSV
zip_path = "278k_labelled_uri.csv.zip"

# Extract contents
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall("unzipped_folder")

# Load the data
music_data = pd.read_csv('unzipped_folder/278k_labelled_uri.csv')

music_data.head(10)

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,duration (ms),danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,spec_rate,labels,uri
0,0,0,195000.0,0.611,0.614,-8.815,0.0672,0.0169,0.000794,0.753,0.52,128.05,3.446154e-07,2,spotify:track:3v6sBj3swihU8pXQQHhDZo
1,1,1,194641.0,0.638,0.781,-6.848,0.0285,0.0118,0.00953,0.349,0.25,122.985,1.464234e-07,1,spotify:track:7KCWmFdw0TzoJbKtqRRzJO
2,2,2,217573.0,0.56,0.81,-8.029,0.0872,0.0071,8e-06,0.241,0.247,170.044,4.00785e-07,1,spotify:track:2CY92qejUrhyPUASawNVRr
3,3,3,443478.0,0.525,0.699,-4.571,0.0353,0.0178,8.8e-05,0.0888,0.199,92.011,7.959809e-08,0,spotify:track:11BPfwVbB7vok7KfjBeW4k
4,4,4,225862.0,0.367,0.771,-5.863,0.106,0.365,1e-06,0.0965,0.163,115.917,4.693131e-07,1,spotify:track:3yUJKPsjvThlcQWTS9ttYx
5,5,5,166920.0,0.572,0.837,-7.876,0.0367,0.0197,0.0,0.163,0.627,100.343,2.198658e-07,1,spotify:track:41MOCUNOgWtaYBFUsGnpZ5
6,6,6,193133.0,0.725,0.687,-6.465,0.0596,0.694,0.000369,0.231,0.77,96.005,3.085956e-07,1,spotify:track:5JP1cMCDxX4k2gwfSgt8Lf
7,7,7,253000.0,0.675,0.547,-4.999,0.0481,0.114,8e-05,0.0678,0.365,75.003,1.901186e-07,1,spotify:track:73xsMXuRNB3yqLeNc7NXBq
8,8,8,216187.0,0.516,0.692,-4.842,0.0279,0.0875,0.0093,0.09,0.181,83.571,1.290549e-07,0,spotify:track:6TwrBbgTaB5gpl06YQoRKy
9,9,9,232333.0,0.548,0.509,-7.937,0.0288,0.261,0.702,0.079,0.484,78.974,1.2396e-07,0,spotify:track:5SDEirHg6Y8fCYuKMnAaC5


***<span style="font-size: 24px;">2.3 Initial Exploration And EDA</span>***

*<span style="font-size: 22px;">2.3.2 Dataset summary</span>*

***<span style="font-size: 24px;">2.4 Data Cleaning</span>***

***<span style="font-size: 24px;"> Saving cleaned data</span>***