In [3]:
from datascience import *
import numpy as np
%matplotlib inline

# Free Music Archive : A Dataset For Music Analysis

This dataset was introduced by Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson at the International Society for Music Information Retrieval Conference in 2017. It has been
cleaned for your convenience: all missing values have been removed, and low-quality observations and variables have been filtered
out. A brief summary of the dataset, originally given at the conference, is provided below. **You
may not copy any public analyses of this dataset. Doing so will result in an automatic F.**

## Summary
"We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community's growing interest in feature and end-to-end learning is however restrained by the limited availability of large audio datasets. The FMA aims to overcome this hurdle by providing 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies."

## Data Description

This dataset consists of three tables stored in the `data` folder:
1. `tracks` provides information on individual tracks.
2. `genres` contains information on all of the genres.
3. `features` contains information on the Spotify audio features of each track.

A description of each table's variables is provided below:

`tracks`:
* `track_id`: a unique ID for each track
* `track_title`: title of each track
* `artist_name`: name of the artist
* `album_title`: title of the album that the track comes from
* `track_duration`: the length of the song in seconds
* `track_genre`: the genre(s) that the track fall(s) into
* `album_date_released`: a string indicating the album release date
* `album_type`: specifies whether the album is studio-recorded, live, or from a radio program
* `album_tracks`: number of tracks on the album

`genres`:
* `genre_id`: a unique ID for each genre
* `title`: the name of the genre
* `# tracks`: the number of tracks that fall into this genre
* `parent`: the genre that this subgenre falls under (will be 0 if not a subgenre)

`features` (descriptions from the [Spotify API page](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/)):
* `track_id`: a unique ID for each track
* `acousticness`: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
* `danceability`: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
* `energy`: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale.
* `instrumentalness`: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. 
* `liveness	`: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.
* `speechiness`: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. 
* `tempo`: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. 
* `valence`: 	A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

## Inspiration

A variety of exploratory analyses, hypothesis tests, and predictions problems can tackled with this data. Here are a few ideas to get
you started:


1. Which genre has the longest songs?
3. Is there a relationship between danceability and energy? What about danceability and valence?
4. Can you classify which genre (of [pick 2 once we see data]) based on its features?
5. Do (pick 2 genres) have the same average energy?

Don't forget to review the Final Project Guidelines *(will link when live)* for a complete list of requirements.

## Exploration

The tables are loaded in the code cells below. Take some time to explore them!

In [4]:
#load genres
genres = Table().read_table("data/genres_final.csv")
genres

genre_id,title,#tracks,parent
1,Avant-Garde,8693,38
2,International,5271,0
3,Blues,1752,0
4,Jazz,4126,0
5,Classical,4106,0
6,Novelty,914,38
7,Comedy,217,20
8,Old-Time / Historic,868,0
9,Country,1987,0
10,Pop,13845,0


In [10]:
#load features
features = Table().read_table("data/features_final.csv")
features

track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence
145,0.235506,0.438672,0.487752,0.716122,0.0703593,0.0472978,120.79,0.650452
155,0.981657,0.142249,0.912122,0.967294,0.36351,0.087527,91.912,0.0343253
201,0.991813,0.461855,0.543751,0.964922,0.137006,0.0256877,93.945,0.758632
307,0.77377,0.552026,0.251328,0.568976,0.110743,0.0506326,117.247,0.356984
309,0.335481,0.390263,0.0210674,0.937508,0.0890457,0.0414906,60.382,0.0399321
319,0.890498,0.316413,0.0596131,0.913303,0.108808,0.0387785,133.934,0.122417
327,0.928171,0.553091,0.303937,0.95423,0.110752,0.107401,110.039,0.616368
328,0.297541,0.624953,0.176884,0.815871,0.0928226,0.0551486,114.858,0.503635
350,0.989664,0.463938,0.363612,0.919011,0.11874,0.0376218,99.384,0.591003
364,0.98638,0.61286,0.15611,0.104549,0.107289,0.198543,109.256,0.483544


In [11]:
#load tracks
tracks = Table().read_table("data/tracks_final.csv")
tracks

track_id,track_title,artist_name,album_title,track_duration,track_genre,album_date_released,album_type,album_tracks
145,Amoebiasis,Amoebic Ensemble,Amoebiasis,326,Jazz,2009-01-06,Album,0
155,Maps of the Stars Homes,Arc and Sender,unreleased demo,756,Rock,2009-01-06,Single Tracks,1
201,Big City,Ed Askew,What I Know,210,Folk,2009-01-07,Album,10
307,Out on the farm,Blah Blah Blah,Green Collection,205,Rock,2007-09-01,Album,0
309,Where are all the people,Blah Blah Blah,Green Collection,229,Rock,2007-09-01,Album,0
319,Complete Shakespeare,Blah Blah Blah,Green Collection,156,Rock,2007-09-01,Album,0
327,Hands Beckoning,Blah Blah Blah,Stripey Collection,259,Rock,1982-04-06,Album,0
328,Central Park,Blah Blah Blah,Stripey Collection,236,Rock,1982-04-06,Album,0
350,Gotta Go,Blah Blah Blah,30th Anniversary Blah Blah Blah,101,Rock,2009-01-01,Album,21
364,Sunspot activity,Blah Blah Blah,30th Anniversary Blah Blah Blah,152,Rock,2009-01-01,Album,21
