Danceability: It measures how suitable a track is for dancing based on factors like rhythm, tempo, and beat strength. A higher value indicates a more danceable track.

Energy: It represents the intensity and activity level of a track. Higher values indicate a more energetic and loud track.

Key: It indicates the key of the track, which represents the overall pitch or tonality. It is represented by integer values from 0 to 11, where each number corresponds to a specific key.

Loudness: It measures the overall loudness of a track in decibels (dB). Negative values indicate quieter tracks, and positive values indicate louder tracks.

Mode: It indicates the modality of a track, which can be major (1) or minor (0). Mode represents the overall emotional character of a track.

Speechiness: It measures the presence of spoken words in a track. Values close to 1.0 indicate tracks that are primarily spoken word, while values closer to 0.0 indicate non-spoken or instrumental tracks.

Acousticness: It represents the acoustic quality of a track. Higher values indicate a higher likelihood of the track being acoustic (without electronic amplification).

Instrumentalness: It measures the amount of instrumental content in a track. Values closer to 1.0 indicate tracks that are purely instrumental, while values closer to 0.0 indicate tracks with vocals.

Liveness: It measures the presence of a live audience in a track. Higher values represent a higher likelihood of the track being performed live.

Valence: It represents the musical positiveness conveyed by a track. Higher values indicate tracks with a more positive or happy mood, while lower values indicate tracks with a more negative or sad mood.

Tempo: It indicates the overall tempo or speed of a track in beats per minute (BPM).
Time Signature: It represents the number of beats in each bar or measure of a track. It is represented by an integer value, indicating the number of beats per bar.

In [1]:
# Importing Libraries
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', None)
pd.options.display.max_colwidth = 150

In [2]:
# importing data
tracks_all = pd.read_csv('spotify_my_downloads.csv')
display(tracks_all.head())
display(tracks_all.info())
tracks_all.columns

Unnamed: 0,Track URI,Track Name,Track Duration (ms),Artist Name,Artist Popularity,Artist Genres,Album Name,Album Release Date,Track Popularity,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Time Signature
0,spotify:track:1BncfTJAWxrsxyT9culBrj,Experience,315426,Ludovico Einaudi,77,"bow pop, compositional ambient, neo-classical",In A Time Lapse,1/1/13,80,0.447,0.449,2,-10.634,1,0.0376,0.934,0.961,0.0697,0.036,92.468,4
1,spotify:track:7lvDsmTRXFE3dK4OjvRiWB,Pasoori,224146,Shae Gill,61,,Pasoori,7/2/22,80,0.714,0.596,11,-6.206,0,0.043,0.0657,0.0,0.0625,0.669,91.991,4
2,spotify:track:0AEhTnH1nR8zJ2d3iwQyM3,Bhalobasha Baki,311250,Popeye Bangladesh,33,bangladeshi indie,Bhalobasha Baki,23/1/15,45,0.662,0.225,6,-10.849,1,0.0317,0.711,0.0,0.289,0.377,145.907,4
3,spotify:track:0I1eFRytp4XRhLCjT6tZm7,I Can't Handle Change,198213,Roar,69,"pov: indie, weirdcore",I Can't Handle Change,14/3/10,82,0.247,0.438,5,-8.478,1,0.0358,0.0447,0.000867,0.111,0.39,185.711,3
4,spotify:track:3fVnlF4pGqWI9flVENcT28,Wildest Dreams,220440,Taylor Swift,100,pop,1989,1/1/14,83,0.553,0.664,8,-7.417,1,0.0741,0.0709,0.0056,0.106,0.467,140.06,4


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 296 entries, 0 to 295
Data columns (total 21 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Track URI            296 non-null    object 
 1   Track Name           296 non-null    object 
 2   Track Duration (ms)  296 non-null    int64  
 3   Artist Name          296 non-null    object 
 4   Artist Popularity    296 non-null    int64  
 5   Artist Genres        284 non-null    object 
 6   Album Name           296 non-null    object 
 7   Album Release Date   296 non-null    object 
 8   Track Popularity     296 non-null    int64  
 9   Danceability         296 non-null    float64
 10  Energy               296 non-null    float64
 11  Key                  296 non-null    int64  
 12  Loudness             296 non-null    float64
 13  Mode                 296 non-null    int64  
 14  Speechiness          296 non-null    float64
 15  Acousticness         296 non-null    flo

None

Index(['Track URI', 'Track Name', 'Track Duration (ms)', 'Artist Name',
       'Artist Popularity', 'Artist Genres', 'Album Name',
       'Album Release Date', 'Track Popularity', 'Danceability', 'Energy',
       'Key', 'Loudness', 'Mode', 'Speechiness', 'Acousticness',
       'Instrumentalness', 'Liveness', 'Valence', 'Tempo', 'Time Signature'],
      dtype='object')

In [3]:
# taking columns that we're going to use
tracks = tracks_all[['Track Name', 'Track Duration (ms)', 'Artist Name',
       'Artist Popularity', 'Album Release Date', 'Track Popularity', 
       'Danceability', 'Energy', 'Key', 'Loudness', 'Mode', 'Speechiness',
       'Acousticness', 'Instrumentalness', 'Liveness', 'Valence', 'Tempo', 
       'Time Signature']]
tracks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 296 entries, 0 to 295
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Track Name           296 non-null    object 
 1   Track Duration (ms)  296 non-null    int64  
 2   Artist Name          296 non-null    object 
 3   Artist Popularity    296 non-null    int64  
 4   Album Release Date   296 non-null    object 
 5   Track Popularity     296 non-null    int64  
 6   Danceability         296 non-null    float64
 7   Energy               296 non-null    float64
 8   Key                  296 non-null    int64  
 9   Loudness             296 non-null    float64
 10  Mode                 296 non-null    int64  
 11  Speechiness          296 non-null    float64
 12  Acousticness         296 non-null    float64
 13  Instrumentalness     296 non-null    float64
 14  Liveness             296 non-null    float64
 15  Valence              296 non-null    flo

In [4]:
# Release Date and Duration columns fixed
tracks['Album Release Date'] = pd.to_datetime(tracks['Album Release Date'])
tracks = tracks[tracks['Album Release Date'].dt.year<=2023]
tracks['Track Duration (s)'] = round(tracks['Track Duration (ms)']/1000,0)
tracks = tracks.drop('Track Duration (ms)',axis=1).copy()

In [5]:
# Inspecting Distributions
tracks.describe()

Unnamed: 0,Artist Popularity,Album Release Date,Track Popularity,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Instrumentalness,Liveness,Valence,Tempo,Time Signature,Track Duration (s)
count,292.0,292,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0
mean,70.452055,2013-04-11 23:10:41.095890432,67.253425,0.58449,0.605188,5.517123,-7.488346,0.667808,0.068185,0.29466,0.08658,0.165762,0.442318,119.156705,3.958904,239.517123
min,0.0,1973-05-01 00:00:00,0.0,0.15,0.0167,0.0,-29.745,0.0,0.0227,3.5e-05,0.0,0.0426,0.036,65.043,3.0,84.0
25%,63.0,2011-11-06 12:00:00,59.0,0.4825,0.4805,2.0,-8.90725,0.0,0.032875,0.045825,0.0,0.09555,0.232,94.99525,4.0,201.0
50%,75.5,2016-02-19 00:00:00,76.5,0.5955,0.6295,6.0,-6.666,1.0,0.04215,0.1885,1.6e-05,0.119,0.4405,117.126,4.0,231.0
75%,84.0,2019-07-13 18:00:00,83.0,0.70425,0.758,8.0,-5.338,1.0,0.0738,0.51575,0.0061,0.1745,0.60275,136.93,4.0,271.0
max,100.0,2023-01-26 00:00:00,94.0,0.936,0.996,11.0,-1.815,1.0,0.449,0.979,0.961,0.832,0.964,207.97,5.0,546.0
std,18.980396,,23.801675,0.15805,0.202676,3.447652,3.642327,0.471808,0.064932,0.289877,0.223458,0.12116,0.234856,29.617583,0.258909,65.847238


In [6]:
tracks.to_csv('my_downloads_cleaned.csv', header='header', index=False)