# Spotify Data Analysis Project
The goal of this project is to analyze user data that can be obtained through Spotify after a long time using the platform.
The analysis of personal data is now a main selling point for spotify with their "spotify's wrapped" release every year for each user.
Being able to analyze the data without waiting a year for it can be greatly insightful and fun to do!

### Imports

In [2]:
#!/usr/bin/python3

import pandas as pd
from dotenv import load_dotenv
from os import getenv

In [122]:
load_dotenv("./spotify.env")

# import all streamin history values
strm0=getenv("streaming0")
print(strm0)
strm1=getenv("streaming1")
print(strm1)
strm2=getenv("streaming2")
print(strm2)

./Spotify Account Data/StreamingHistory_music_0.json
./Spotify Account Data/StreamingHistory_music_1.json
./Spotify Account Data/StreamingHistory_music_2.json


In [123]:
stream_db0 = pd.read_json(strm0)
stream_db1 = pd.read_json(strm1)
stream_db2= pd.read_json(strm2)

temp = pd.concat([stream_db0,stream_db1],ignore_index=True)
streaming_history = pd.concat([temp, stream_db2],ignore_index=True)


print(streaming_history.head(10))
print(streaming_history.size)

            endTime          artistName  \
0  2023-03-13 12:52          Madvillain   
1  2023-03-15 11:24         Jean Dawson   
2  2023-03-19 23:17     Superstar Pride   
3  2023-03-20 09:29          Dreamville   
4  2023-03-20 09:33      Kendrick Lamar   
5  2023-03-20 09:38               Drake   
6  2023-03-20 09:42          Kanye West   
7  2023-03-20 09:43                Joji   
8  2023-03-20 09:47         Joey Bada$$   
9  2023-03-20 09:50  Tyler, The Creator   

                                           trackName  msPlayed  
0                                       Meat Grinder     73231  
1                                         BAD FRUIT*      4080  
2                                  PAINTING PICTURES     43073  
3  Sacrifices (with EARTHGANG & J. Cole feat. Smi...    382306  
4                                             PRIDE.    275253  
5                                           Too Much    261866  
6                                            Bound 2    229146  
7     

### Preparing the data

In [124]:
print(streaming_history.sort_values(by="endTime",ascending=False).iloc[0:1])
print(streaming_history.sort_values(by="endTime",ascending=True).iloc[0:1])

                endTime artistName trackName  msPlayed
26559  2024-03-20 18:03       MAVI     Sense       980
            endTime  artistName     trackName  msPlayed
0  2023-03-13 12:52  Madvillain  Meat Grinder     73231


Analysis conducted between 2023-03-13 12:52 and 2024-03-20 18:03

In [125]:
streaming_history.drop(columns="endTime",inplace=True) #removing the endTime column

In [126]:
streaming_history["msPlayed"] = streaming_history["msPlayed"]/1000 #time played in seconds
streaming_history.rename(columns={"msPlayed":"TimePlayedS"},inplace=True) #renaming the columnn
streaming_history.head(5)

Unnamed: 0,artistName,trackName,TimePlayedS
0,Madvillain,Meat Grinder,73.231
1,Jean Dawson,BAD FRUIT*,4.08
2,Superstar Pride,PAINTING PICTURES,43.073
3,Dreamville,Sacrifices (with EARTHGANG & J. Cole feat. Smi...,382.306
4,Kendrick Lamar,PRIDE.,275.253


Now creating new columns for the time played, one in minutes and one in hours.

In [127]:
streaming_history["TimePlayedM"]=streaming_history["TimePlayedS"]/60
streaming_history["TimePlayedH"]=streaming_history["TimePlayedM"]/60

In [133]:
streaming_history.drop(streaming_history[streaming_history["TimePlayedM"]<=1].index, inplace=True)
streaming_history.drop(columns="TimePlayedS",inplace=True)
streaming_history.dropna(inplace=True)

Only keeping the songs that have been listened to for at least a minute.

### Analyzing the data

In [134]:
dictt_agg = {"artistName":lambda x: x,"TimePlayedM":"sum","TimePlayedH":"sum"}

In [135]:
streaming_history.groupby(by="trackName").agg(dictt_agg).sort_values(by='TimePlayedM',ascending=False)

Unnamed: 0_level_0,artistName,TimePlayedM,TimePlayedH
trackName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MY EYES,Travis Scott,342.788933,5.713149
Fallin',Joey Bada$$,250.568300,4.176138
Be Your Girl (Kaytranada Edition),Teedra Moses,205.820033,3.430334
Hummingbird (Metro Boomin & James Blake),Metro Boomin,162.775467,2.712924
PRIDE.,Kendrick Lamar,147.098067,2.451634
...,...,...,...
The Middle of the World,Nicholas Britell,1.006467,0.016774
PLACE ON FIRE,Jasiah,1.004450,0.016741
Currents,Drake,1.003333,0.016722
Fall Back,James Blake,1.000850,0.016681


In [138]:
streaming_history[streaming_history["artistName"].str.contains("SZA")].groupby(by="trackName").agg(dictt_agg).sort_values(by="TimePlayedM",ascending=False)


Unnamed: 0_level_0,artistName,TimePlayedM,TimePlayedH
trackName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Love Galore (feat. Travis Scott),SZA,73.069783,1.21783
Supermodel,SZA,46.176917,0.769615
Doves In The Wind (feat. Kendrick Lamar),SZA,42.0872,0.701453
Snooze,SZA,37.577033,0.626284
Good Days,SZA,37.178933,0.619649
Ghost in the Machine (feat. Phoebe Bridgers),SZA,26.926083,0.448768
The Weekend,SZA,26.849167,0.447486
Saturn,SZA,21.111033,0.351851
Kill Bill,SZA,19.845433,0.330757
Prom,SZA,16.401683,0.273361
