In [2]:
from datascience import *
import numpy as np
%matplotlib inline

ModuleNotFoundError: No module named 'datascience'

# Free Music Archive : A Dataset For Music Analysis

This dataset was introduced by Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson at the International Society for Music Information Retrieval Conference in 2017. It has been
cleaned for your convenience: all missing values have been removed, and low-quality observations and variables have been filtered
out. A brief summary of the dataset, originally given at the conference, is provided below. **You
may not copy any public analyses of this dataset. Doing so will result in an automatic F.**

## Summary
We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community's growing interest in feature and end-to-end learning is however restrained by the limited availability of large audio datasets. The FMA aims to overcome this hurdle by providing 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies. We here describe the dataset and how it was created, propose a train/validation/test split and three subsets, discuss some suitable MIR tasks, and evaluate some baselines for genre recognition.

## Data Description

This dataset consists of three tables stored in the `data` folder:
1. `tracks` provides information on individual tracks.
2. `genres` contains information on all of the genres.
3. `features` contains information on the Spotify audio features of each track.

A description of each table's variables is provided below:

`tracks`:
* `track_id`: a unique ID for each track
* `title`: title of each track
* `duration`: the length of the song in seconds
* `genres`: the genre(s) that the track fall(s) into

`genres`:
* `genre_id`: a unique ID for each genre
* `title`: the name of the genre
* `# tracks`: the number of tracks that fall into this genre
* `parent`: the genre that this subgenre falls under (will be 0 if not a subgenre)

`features` (descriptions from the [Spotify API page](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/)):
* `track_id`: a unique ID for each track
* `acousticness`: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
* `danceability`: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
* `energy`: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale.
* `instrumentalness`: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. 
* `liveness	`: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.
* `speechiness`: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. 
* `tempo`: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. 
* `valence`: 	A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

## Inspiration

A variety of exploratory analyses, hypothesis tests, and predictions problems can tackled with this data. Here are a few ideas to get
you started:

*we'll come back and fix these*

1. Which genre has the longest songs?
3. Is there a relationship between danceability and energy? What about danceability and valence?
4. Can you classify which genre (of [pick 2 once we see data]) based on its features?
5. Do (pick 2 genres) have the same average energy?

Don't forget to review the Final Project Guidelines *(will link when live)* for a complete list of requirements.

## Exploration

The tables are loaded in the code cells below. Take some time to explore them!

In [3]:
#load tracks

In [14]:
#load genres
genres = Table().read_table("genres.csv").drop('top_level')
genres

genre_id,#tracks,parent,title
1,8693,38,Avant-Garde
2,5271,0,International
3,1752,0,Blues
4,4126,0,Jazz
5,4106,0,Classical
6,914,38,Novelty
7,217,20,Comedy
8,868,0,Old-Time / Historic
9,1987,0,Country
10,13845,0,Pop


In [13]:
#load features
features = Table().read_table("features.csv")
features = features.select(np.arange(17)).remove(0)
features

Unnamed: 0,echonest,echonest.1,echonest.2,echonest.3,echonest.4,echonest.5,echonest.6,echonest.7,echonest.8,echonest.9,echonest.10,echonest.11,echonest.12,echonest.13,echonest.14,echonest.15
,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence,album_date,album_name,artist_latitude,artist_location,artist_longitude,artist_name,release,artist_discovery_rank
track_id,,,,,,,,,,,,,,,,
2,0.4166752327,0.6758939853,0.6344762684,0.0106280683,0.1776465712,0.1593100648,165.9220000000,0.5766609880,,,32.6783000000,"Georgia, US",-83.2230000000,AWOL,AWOL - A Way Of Life,
3,0.3744077685,0.5286430621,0.8174611317,0.0018511032,0.1058799438,0.4618181276,126.9570000000,0.2692402421,,,32.6783000000,"Georgia, US",-83.2230000000,AWOL,AWOL - A Way Of Life,
5,0.0435668989,0.7455658702,0.7014699916,0.0006967990,0.3731433124,0.1245953419,100.2600000000,0.6216612236,,,32.6783000000,"Georgia, US",-83.2230000000,AWOL,AWOL - A Way Of Life,
10,0.9516699648,0.6581786543,0.9245251615,0.9654270154,0.1154738842,0.0329852191,111.5620000000,0.9635898919,2008-03-11,Constant Hitmaker,39.9523000000,"Philadelphia, PA, US",-75.1624000000,Kurt Vile,Constant Hitmaker,2635.0000000000
134,0.4522173071,0.5132380502,0.5604099311,0.0194426943,0.0965666940,0.5255193792,114.2900000000,0.8940722715,,,32.6783000000,"Georgia, US",-83.2230000000,AWOL,AWOL - A Way Of Life,
139,0.1065495253,0.2609111726,0.6070668636,0.8350869898,0.2236762711,0.0305692764,196.9610000000,0.1602670903,,,41.8239000000,"Providence, RI, US",-71.4120000000,Alec K. Redfearn and the Eyesores,Every Man For Himself,149495.0000000000
140,0.3763124975,0.7340790229,0.2656847734,0.6695811237,0.0859951222,0.0390682262,107.9520000000,0.6099912728,,,41.8239000000,"Providence, RI, US",-71.4120000000,Alec K. Redfearn and the Eyesores,The Blind Spot,149495.0000000000
141,0.9636568796,0.4359329980,0.0756321472,0.3454934909,0.1056858694,0.0266578493,33.4770000000,0.1639499337,,,41.8239000000,"Providence, RI, US",-71.4120000000,Alec K. Redfearn and the Eyesores,Every Man For Himself,


In [1]:
#load tracks
tracks = Table().read_table("tracks.csv")
tracks

NameError: name 'Table' is not defined