In [2]:
from datascience import *
import numpy as np
%matplotlib inline

ModuleNotFoundError: No module named 'datascience'

# Free Music Archive : A Dataset For Music Analysis

This dataset was introduced by Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson at the International Society for Music Information Retrieval Conference in 2017. It has been
cleaned for your convenience: all missing values have been removed, and low-quality observations and variables have been filtered
out. A brief summary of the dataset, originally given at the conference, is provided below. **You
may not copy any public analyses of this dataset. Doing so will result in an automatic F.**

## Summary
We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community's growing interest in feature and end-to-end learning is however restrained by the limited availability of large audio datasets. The FMA aims to overcome this hurdle by providing 917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies.

## Data Description

This dataset consists of three tables stored in the `data` folder:
1. `tracks` provides information on individual tracks.
2. `genres` contains information on all of the genres.
3. `features` contains information on the Spotify audio features of each track.

A description of each table's variables is provided below:

`tracks`:
* `track_id`: a unique ID for each track
* `track_title`: title of each track
* `artist_name`: name of the artist
* `album_title`: title of the album that the track comes from
* `track_duration`: the length of the song in seconds
* `track_genre`: the genre(s) that the track fall(s) into
* `album_date_released`: a string indicating the album release date
* `album_type`: specifies whether the album is studio-recorded, live, or from a radio program
* `album_tracks`: number of tracks on the album

`genres`:
* `genre_id`: a unique ID for each genre
* `title`: the name of the genre
* `# tracks`: the number of tracks that fall into this genre
* `parent`: the genre that this subgenre falls under (will be 0 if not a subgenre)

`features` (descriptions from the [Spotify API page](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/)):
* `track_id`: a unique ID for each track
* `acousticness`: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
* `danceability`: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
* `energy`: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale.
* `instrumentalness`: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. 
* `liveness	`: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.
* `speechiness`: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. 
* `tempo`: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. 
* `valence`: 	A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

## Inspiration

A variety of exploratory analyses, hypothesis tests, and predictions problems can tackled with this data. Here are a few ideas to get
you started:


1. Which genre has the longest songs?
3. Is there a relationship between danceability and energy? What about danceability and valence?
4. Can you classify which genre (of [pick 2 once we see data]) based on its features?
5. Do (pick 2 genres) have the same average energy?

Don't forget to review the Final Project Guidelines *(will link when live)* for a complete list of requirements.

## Exploration

The tables are loaded in the code cells below. Take some time to explore them!

In [4]:
#load genres
genres = Table().read_table("genres_final.csv")
genres

genre_id,title,#tracks,parent
1,Avant-Garde,8693,38
2,International,5271,0
3,Blues,1752,0
4,Jazz,4126,0
5,Classical,4106,0
6,Novelty,914,38
7,Comedy,217,20
8,Old-Time / Historic,868,0
9,Country,1987,0
10,Pop,13845,0


In [3]:
#load features
features = Table().read_table("features_final.csv")
features

Unnamed: 0,track_id,acousticness,danceability,energy,instrumentalness,liveness,speechiness,tempo,valence
3,2,0.416675,0.675894,0.634476,0.0106281,0.177647,0.15931,165.922,0.576661
4,3,0.374408,0.528643,0.817461,0.0018511,0.10588,0.461818,126.957,0.26924
5,5,0.0435669,0.745566,0.70147,0.000696799,0.373143,0.124595,100.26,0.621661
6,10,0.95167,0.658179,0.924525,0.965427,0.115474,0.0329852,111.562,0.96359
7,134,0.452217,0.513238,0.56041,0.0194427,0.0965667,0.525519,114.29,0.894072
8,139,0.10655,0.260911,0.607067,0.835087,0.223676,0.0305693,196.961,0.160267
9,140,0.376312,0.734079,0.265685,0.669581,0.0859951,0.0390682,107.952,0.609991
10,141,0.963657,0.435933,0.0756321,0.345493,0.105686,0.0266578,33.477,0.16395
11,142,0.662881,0.379065,0.823856,0.910266,0.0887053,0.0790904,147.781,0.0928676
12,144,0.909011,0.443643,0.641997,0.924092,0.267669,0.0896589,128.537,0.788251


In [2]:
#load tracks
tracks = Table().read_table("tracks_final.csv")
tracks

Unnamed: 0,track_id,track_title,artist_name,album_title,track_duration,track_genre,track_bit_rate,track_listens,track_comments,artist_location,album_date_released,album_type,album_tracks
2,2,Food,AWOL,AWOL - A Way Of Life,168,Hip-Hop,256000,1293,0,New Jersey,2009-01-05,Album,7
3,3,Electric Ave,AWOL,AWOL - A Way Of Life,237,Hip-Hop,256000,514,0,New Jersey,2009-01-05,Album,7
4,5,This World,AWOL,AWOL - A Way Of Life,206,Hip-Hop,256000,1151,0,New Jersey,2009-01-05,Album,7
11,134,Street Music,AWOL,AWOL - A Way Of Life,207,Hip-Hop,256000,943,0,New Jersey,2009-01-05,Album,7
14,137,Side A,Airway,Live at LACE,1233,Experimental,256000,1278,0,"Los Angeles, CA",2006-12-01,Live Performance,2
15,138,Side B,Airway,Live at LACE,1231,Experimental,256000,489,0,"Los Angeles, CA",2006-12-01,Live Performance,2
16,139,CandyAss,Alec K. Redfearn & the Eyesores,Every Man For Himself,296,Folk,128000,582,0,"Providence, RI",2009-01-16,Album,2
17,140,Queen Of The Wires,Alec K. Redfearn & the Eyesores,The Blind Spot,253,Folk,128000,1299,0,"Providence, RI",2007-05-22,Album,1
18,141,Ohio,Alec K. Redfearn & the Eyesores,Every Man For Himself,182,Folk,128000,725,0,"Providence, RI",2009-01-16,Album,2
19,142,Punjabi Watery Grave,Alec K. Redfearn & the Eyesores,The Quiet Room,470,Folk,128000,848,0,"Providence, RI",2005-01-25,Album,1


In [1]:
#load tracks
tracks = Table().read_table("tracks.csv")
tracks

NameError: name 'Table' is not defined