# Import data and inspect!

## Load packages

We'll use packages to help us to different things. Packages are built on functions. Functions take an input, do something with it, and give you an output. Packages will save us time because someone else did all the thinking in how to clean up the data and give it to us in a useable form.

Some packages help us manipulate the data. Other packages help us make charts. For Spotify data, there's already a package called `spotipy` that helps us interact with developer data within Spotify. But, another package caled `tidyspotify` simplifies that even more!

In [1]:
import altair as alt
import dotenv
import os
import pandas as pd
import spotipy
import tidyspotify

from spotipy.oauth2 import SpotifyClientCredentials

dotenv.load_dotenv()

config = {
    'client_id': os.environ["SPOTIPY_CLIENT_ID"],
    'client_secret': os.environ["SPOTIPY_CLIENT_SECRET"],
}
                                
sp = spotipy.Spotify(client_credentials_manager = SpotifyClientCredentials(**config))

#tidyspotify.save_credentials()

## Dataframes

Behind a chart, there's a *spreadsheet* with all the data we need stored as columns and rows. Usually, we open spreadsheets in Excel or Google Sheets. Within this Jupyter Notebook, we will read in our *spreadsheet* and call it a *dataframe*.


We can import a spreadsheet to be a dataframe. <br>
We can also export our dataframe to be a spreadsheet.

In [2]:
artist1 = tidyspotify.get_artist_audio_features('greatest showman')
artist2 = tidyspotify.get_artist_audio_features('beyonce')

In [3]:
artist1.head()

Unnamed: 0,album_type,album_id,album_name,release_date,artist_name,artist_uri,artist_genres,album_popularity,duration_ms,explicit,...,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_href,analysis_url,time_signature
0,single,7luUoT7Y82DkMvGzZib8oo,"This Is Me [Alan Walker Relift (From ""The Grea...",2018-03-23,Keala Settle,spotify:artist:7HV2RI2qNug4EcQqLbCAKS,movie tunes,46,215758,False,...,1,0.0756,0.129,0.0,0.304,0.294,95.331,https://api.spotify.com/v1/tracks/1QBoJK6GLAYM...,https://api.spotify.com/v1/audio-analysis/1QBo...,4
1,single,1bbOecKvywdyRfSBxKbbCO,This Is Me [Dave Audé Remix (From The Greatest...,2018-02-09,Keala Settle,spotify:artist:7HV2RI2qNug4EcQqLbCAKS,movie tunes,46,216204,False,...,0,0.134,0.00708,1.7e-05,0.12,0.518,121.939,https://api.spotify.com/v1/tracks/1NNhZKNrg8WV...,https://api.spotify.com/v1/audio-analysis/1NNh...,4
2,single,4wANZhDo96C4733BPN3cGZ,Asi Soy,2017-12-29,Maite Perroni,spotify:artist:6ModsWtBph2rE6zCTVxvZt,movie tunes,38,235599,False,...,1,0.046,0.0324,0.0,0.0878,0.182,95.943,https://api.spotify.com/v1/tracks/0MaP6uyxXBAj...,https://api.spotify.com/v1/audio-analysis/0MaP...,4
3,single,7ngTKzJ53B9SPirsp2Ddm4,"A Million Dreams (Reprise) [From ""The Greatest...",2017-12-05,The Greatest Showman Ensemble,spotify:artist:63nv0hWWDob56Rk8GlNpN8,movie tunes,35,60455,False,...,1,0.0402,0.453,0.127,0.296,0.0585,84.167,https://api.spotify.com/v1/tracks/6yqbH2bs1Um8...,https://api.spotify.com/v1/audio-analysis/6yqb...,3
4,single,1vDzOqTO7KyVvzexhB7GBa,"Never Enough (Reprise) [From ""The Greatest Sho...",2017-12-05,The Greatest Showman Ensemble,spotify:artist:63nv0hWWDob56Rk8GlNpN8,movie tunes,33,80286,False,...,1,0.0372,0.876,0.231,0.0804,0.0385,70.082,https://api.spotify.com/v1/tracks/5R0JBZv31s8a...,https://api.spotify.com/v1/audio-analysis/5R0J...,4


In [4]:
artist2.head()

Unnamed: 0,album_type,album_id,album_name,release_date,artist_name,artist_uri,artist_genres,album_popularity,duration_ms,explicit,...,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_href,analysis_url,time_signature
0,album,7kUuNU2LRmr9XbwLHXU9UZ,The Lion King: The Gift [Deluxe Edition],2020-07-31,Beyoncé,spotify:artist:6vWDO969PvNqNYHIOW5v0m,dance pop-!!-pop-!!-post-teen pop-!!-r&b,70,226360,False,...,0,0.0361,0.633,0.0,0.13,0.181,129.624,https://api.spotify.com/v1/tracks/0gdOcxvPU37A...,https://api.spotify.com/v1/audio-analysis/0gdO...,4
1,album,7kUuNU2LRmr9XbwLHXU9UZ,The Lion King: The Gift [Deluxe Edition],2020-07-31,Beyoncé,spotify:artist:6vWDO969PvNqNYHIOW5v0m,dance pop-!!-pop-!!-post-teen pop-!!-r&b,70,162438,False,...,0,0.174,0.197,0.0274,0.0768,0.623,119.063,https://api.spotify.com/v1/tracks/0Ae8EusoCw7l...,https://api.spotify.com/v1/audio-analysis/0Ae8...,4
2,album,7kUuNU2LRmr9XbwLHXU9UZ,The Lion King: The Gift [Deluxe Edition],2020-07-31,Beyoncé,spotify:artist:6vWDO969PvNqNYHIOW5v0m,dance pop-!!-pop-!!-post-teen pop-!!-r&b,70,156442,False,...,0,0.101,0.0194,0.0718,0.355,0.563,120.005,https://api.spotify.com/v1/tracks/6VsNsaxmzrKZ...,https://api.spotify.com/v1/audio-analysis/6VsN...,4
3,album,7kUuNU2LRmr9XbwLHXU9UZ,The Lion King: The Gift [Deluxe Edition],2020-07-31,Beyoncé,spotify:artist:6vWDO969PvNqNYHIOW5v0m,dance pop-!!-pop-!!-post-teen pop-!!-r&b,70,190434,False,...,1,0.0601,0.0488,0.0119,0.0752,0.867,92.001,https://api.spotify.com/v1/tracks/3W7fs58MeWMZ...,https://api.spotify.com/v1/audio-analysis/3W7f...,4
4,album,7kUuNU2LRmr9XbwLHXU9UZ,The Lion King: The Gift [Deluxe Edition],2020-07-31,Beyoncé,spotify:artist:6vWDO969PvNqNYHIOW5v0m,dance pop-!!-pop-!!-post-teen pop-!!-r&b,70,107462,False,...,0,0.182,0.731,0.0,0.339,0.348,87.047,https://api.spotify.com/v1/tracks/7wD8R6QWYJbp...,https://api.spotify.com/v1/audio-analysis/7wD8...,4


In [5]:
# Save to csv
artist1.to_csv("../data/my_audio_features.csv", index=False)

# Open csv
df = pd.read_csv("../data/my_audio_features.csv")

## Inspect the data

* What columns are stored? 
* What do those columns mean?
* How many rows?
* Can I just see a little bit of the data?
* What are the *data types* of the columns (are the values strings, numeric, dates, etc)

In [6]:
# What columns are stored?
# Print column names
df.columns

Index(['album_type', 'album_id', 'album_name', 'release_date', 'artist_name',
       'artist_uri', 'artist_genres', 'album_popularity', 'duration_ms',
       'explicit', 'href', 'track_id', 'track_name', 'preview_url',
       'track_number', 'uri', 'track_popularity', 'danceability', 'energy',
       'key', 'loudness', 'mode', 'speechiness', 'acousticness',
       'instrumentalness', 'liveness', 'valence', 'tempo', 'track_href',
       'analysis_url', 'time_signature'],
      dtype='object')

In [7]:
# What do the columns mean?
# Maybe we need a description of the data, usually called data dictionary
# Sometimes there will be documentation from the dataset owner 

References:
* Popularity: https://community.spotify.com/t5/Content-Questions/Artist-popularity/td-p/4415259, 1-100 (100 being the best)
* Spotify audio features [data dictionary](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-01-21/readme.md)
* [Really advanced and cool stuff done with Spotify data](https://www.kaylinpavlik.com/classifying-songs-genres/)

In [8]:
# How many rows?
# Print the length of the dataframe
len(df)

264

In [9]:
# Can I see a little bit of the data?
# You can inspect the head (first couple of observations) or tail (last couple) of the dataset
# By default, it's 5 observations, but you can always do more or less
df.head()

Unnamed: 0,album_type,album_id,album_name,release_date,artist_name,artist_uri,artist_genres,album_popularity,duration_ms,explicit,...,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_href,analysis_url,time_signature
0,single,7luUoT7Y82DkMvGzZib8oo,"This Is Me [Alan Walker Relift (From ""The Grea...",2018-03-23,Keala Settle,spotify:artist:7HV2RI2qNug4EcQqLbCAKS,movie tunes,46,215758,False,...,1,0.0756,0.129,0.0,0.304,0.294,95.331,https://api.spotify.com/v1/tracks/1QBoJK6GLAYM...,https://api.spotify.com/v1/audio-analysis/1QBo...,4
1,single,1bbOecKvywdyRfSBxKbbCO,This Is Me [Dave Audé Remix (From The Greatest...,2018-02-09,Keala Settle,spotify:artist:7HV2RI2qNug4EcQqLbCAKS,movie tunes,46,216204,False,...,0,0.134,0.00708,1.7e-05,0.12,0.518,121.939,https://api.spotify.com/v1/tracks/1NNhZKNrg8WV...,https://api.spotify.com/v1/audio-analysis/1NNh...,4
2,single,4wANZhDo96C4733BPN3cGZ,Asi Soy,2017-12-29,Maite Perroni,spotify:artist:6ModsWtBph2rE6zCTVxvZt,movie tunes,38,235599,False,...,1,0.046,0.0324,0.0,0.0878,0.182,95.943,https://api.spotify.com/v1/tracks/0MaP6uyxXBAj...,https://api.spotify.com/v1/audio-analysis/0MaP...,4
3,single,7ngTKzJ53B9SPirsp2Ddm4,"A Million Dreams (Reprise) [From ""The Greatest...",2017-12-05,The Greatest Showman Ensemble,spotify:artist:63nv0hWWDob56Rk8GlNpN8,movie tunes,35,60455,False,...,1,0.0402,0.453,0.127,0.296,0.0585,84.167,https://api.spotify.com/v1/tracks/6yqbH2bs1Um8...,https://api.spotify.com/v1/audio-analysis/6yqb...,3
4,single,1vDzOqTO7KyVvzexhB7GBa,"Never Enough (Reprise) [From ""The Greatest Sho...",2017-12-05,The Greatest Showman Ensemble,spotify:artist:63nv0hWWDob56Rk8GlNpN8,movie tunes,33,80286,False,...,1,0.0372,0.876,0.231,0.0804,0.0385,70.082,https://api.spotify.com/v1/tracks/5R0JBZv31s8a...,https://api.spotify.com/v1/audio-analysis/5R0J...,4


In [10]:
df.tail()

Unnamed: 0,album_type,album_id,album_name,release_date,artist_name,artist_uri,artist_genres,album_popularity,duration_ms,explicit,...,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_href,analysis_url,time_signature
259,compilation,3jv5xtSwOOZNCKqDD2IdQ4,The Greatest Showman (Original Motion Picture ...,2017-12-08,Various Artists,spotify:artist:0LyfQWJT6nXafLPZqxe9Of,movie tunes,11,217440,False,...,1,0.0386,0.0716,0.0,0.122,0.284,125.046,https://api.spotify.com/v1/tracks/5UeZmjxAeq51...,https://api.spotify.com/v1/audio-analysis/5UeZ...,4
260,compilation,3jv5xtSwOOZNCKqDD2IdQ4,The Greatest Showman (Original Motion Picture ...,2017-12-08,Various Artists,spotify:artist:0LyfQWJT6nXafLPZqxe9Of,movie tunes,11,234360,False,...,1,0.0382,0.3,9e-06,0.0919,0.191,161.564,https://api.spotify.com/v1/tracks/2JhosnlBozUm...,https://api.spotify.com/v1/audio-analysis/2Jho...,3
261,compilation,3jv5xtSwOOZNCKqDD2IdQ4,The Greatest Showman (Original Motion Picture ...,2017-12-08,Various Artists,spotify:artist:0LyfQWJT6nXafLPZqxe9Of,movie tunes,11,80293,False,...,1,0.03,0.889,3e-06,0.0832,0.133,82.486,https://api.spotify.com/v1/tracks/51XfShcdVF2l...,https://api.spotify.com/v1/audio-analysis/51Xf...,4
262,compilation,3jv5xtSwOOZNCKqDD2IdQ4,The Greatest Showman (Original Motion Picture ...,2017-12-08,Various Artists,spotify:artist:0LyfQWJT6nXafLPZqxe9Of,movie tunes,11,349560,False,...,1,0.04,0.00483,0.00895,0.585,0.0834,108.027,https://api.spotify.com/v1/tracks/2MUwM6lOwmzM...,https://api.spotify.com/v1/audio-analysis/2MUw...,4
263,single,09w7mJRYa15Uk7rtM26QfZ,The Greatest Show,2017-10-27,Various Artists,spotify:artist:0LyfQWJT6nXafLPZqxe9Of,movie tunes,53,302146,False,...,0,0.105,0.000239,0.0545,0.0725,0.4,157.92,https://api.spotify.com/v1/tracks/43ay9lQZ5rfN...,https://api.spotify.com/v1/audio-analysis/43ay...,4


In [11]:
df.head(2)

Unnamed: 0,album_type,album_id,album_name,release_date,artist_name,artist_uri,artist_genres,album_popularity,duration_ms,explicit,...,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_href,analysis_url,time_signature
0,single,7luUoT7Y82DkMvGzZib8oo,"This Is Me [Alan Walker Relift (From ""The Grea...",2018-03-23,Keala Settle,spotify:artist:7HV2RI2qNug4EcQqLbCAKS,movie tunes,46,215758,False,...,1,0.0756,0.129,0.0,0.304,0.294,95.331,https://api.spotify.com/v1/tracks/1QBoJK6GLAYM...,https://api.spotify.com/v1/audio-analysis/1QBo...,4
1,single,1bbOecKvywdyRfSBxKbbCO,This Is Me [Dave Audé Remix (From The Greatest...,2018-02-09,Keala Settle,spotify:artist:7HV2RI2qNug4EcQqLbCAKS,movie tunes,46,216204,False,...,0,0.134,0.00708,1.7e-05,0.12,0.518,121.939,https://api.spotify.com/v1/tracks/1NNhZKNrg8WV...,https://api.spotify.com/v1/audio-analysis/1NNh...,4


In [12]:
df.tail(3)

Unnamed: 0,album_type,album_id,album_name,release_date,artist_name,artist_uri,artist_genres,album_popularity,duration_ms,explicit,...,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,track_href,analysis_url,time_signature
261,compilation,3jv5xtSwOOZNCKqDD2IdQ4,The Greatest Showman (Original Motion Picture ...,2017-12-08,Various Artists,spotify:artist:0LyfQWJT6nXafLPZqxe9Of,movie tunes,11,80293,False,...,1,0.03,0.889,3e-06,0.0832,0.133,82.486,https://api.spotify.com/v1/tracks/51XfShcdVF2l...,https://api.spotify.com/v1/audio-analysis/51Xf...,4
262,compilation,3jv5xtSwOOZNCKqDD2IdQ4,The Greatest Showman (Original Motion Picture ...,2017-12-08,Various Artists,spotify:artist:0LyfQWJT6nXafLPZqxe9Of,movie tunes,11,349560,False,...,1,0.04,0.00483,0.00895,0.585,0.0834,108.027,https://api.spotify.com/v1/tracks/2MUwM6lOwmzM...,https://api.spotify.com/v1/audio-analysis/2MUw...,4
263,single,09w7mJRYa15Uk7rtM26QfZ,The Greatest Show,2017-10-27,Various Artists,spotify:artist:0LyfQWJT6nXafLPZqxe9Of,movie tunes,53,302146,False,...,0,0.105,0.000239,0.0545,0.0725,0.4,157.92,https://api.spotify.com/v1/tracks/43ay9lQZ5rfN...,https://api.spotify.com/v1/audio-analysis/43ay...,4


In [13]:
# What are the data types?
# Object = strings, text
# Float/Int = numeric
# Bool = boolean, True/False 
# Datetime = dates/times
df.dtypes

album_type           object
album_id             object
album_name           object
release_date         object
artist_name          object
artist_uri           object
artist_genres        object
album_popularity      int64
duration_ms           int64
explicit               bool
href                 object
track_id             object
track_name           object
preview_url          object
track_number          int64
uri                  object
track_popularity      int64
danceability        float64
energy              float64
key                   int64
loudness            float64
mode                  int64
speechiness         float64
acousticness        float64
instrumentalness    float64
liveness            float64
valence             float64
tempo               float64
track_href           object
analysis_url         object
time_signature        int64
dtype: object

## Make a bar chart

In [14]:
alt.themes.enable("fivethirtyeight")

# Instead of latimes, try urbaninstitute, fivethirtyeight, vox, quartz, ggplot2

(alt.Chart(df)
 .mark_bar()
 .encode(
     x="artist_name",
     y="track_popularity"
 )
)

In [15]:
(alt.Chart(df[df.album_name=="The Greatest Showman (Original Motion Picture Soundtrack)"])
 .mark_bar()
 .encode(
     x=alt.X("track_name:N", title="Track Name"),
     y=alt.Y("mean(speechiness):Q", title="Average Speechiness")
 ).properties(title="Speechiness of Greatest Showman Tracks")
)

In [16]:
(alt.Chart(df[df.album_name=="The Greatest Showman (Original Motion Picture Soundtrack)"])
 .mark_bar()
 .encode(
     x=alt.X("track_name:N", title="Track Name"),
     y=alt.Y("sum(valence):Q", title="Valence"),
     color="key:N"
 ).properties(title="Valence of Greatest Showman Tracks")
)

## Make a scatterplot

In [17]:
alt.themes.enable("quartz")

(alt.Chart(df)
 .mark_point()
 .encode(
     x=alt.X("valence:Q"),
     y=alt.Y("tempo:Q"),
     color="album_name"
 )
)

In [18]:
(alt.Chart(df)
 .mark_circle()
 .encode(
     x=alt.X("valence:Q"),
     y=alt.Y("speechiness:Q"),
     color="album_name"
 )
)

In [19]:
(alt.Chart(df)
 .mark_point()
 .encode(
     x=alt.X("valence:Q"),
     y=alt.Y("danceability:Q"),
     color=alt.Color("album_name", title = "Album")
 ).properties(title = "Valence vs Danceability", width = 400, height = 300)
)