Code for the Million Song Dataset, the dataset contains metadata and audio analysis for a million tracks, a collaboration between The Echo Nest and LabROSA. See website for details.
Python Matlab Java C++ M


January 2011

  • The dataset contains the analysis and metadata for a million songs. The goal is to provide a large dataset for researchers to report results on, hence encouraging algorithms that scale to commercial sizes.

  • Most of the information is provided by The Echo Nest. The dataset is the result of a collaboration between The Echo Nest and LabROSA at Columbia University. This project is funded in part by the NSF.

  • Most of the data is licensed the same way as Echo Nest's API.

    For the SecondHandSongs dataset (cover songs), see the webpage:

    For the musiXmatch dataset (lyrics), see the webpage:

    The code is under GNU public license. See LICENSE for details.

  • Most details and instructions on how to get the dataset can be found on the project's website:

If you have any question or comment:!forum/millionsongdataset