Skip to content

yfeng46/SongAnalytics

Repository files navigation

MSiA423 Project Song Analytics

  • Owner: Yi (Joyce) Feng
  • QA: Yucheng Zhu

Project Charter

This Branch is for RDS deployment. Please checkout Local branch for the codes working on local.

Vision: Music industry, as its long history, has changed a lot throughout the years, both in melody and lyrics. The project is intended to predict the song's danceability from its lyrics.

Mission: By inputting information of a song such as lyric, genre and etc., the web app will predict the danceability in scale 0 to 1.

Success criteria:

  • ML criteria: The dataset will be separated into training set and testing set. MSE could be used as the machine learning measurement to evaluate models and choose the optimal one.
  • Business criteria: By predicting the danceability of the song using the lyrics, the app will help the user to know the song even before listening to it. To measure how good the app is, user satisfaction rating is needed.

Planning

Develop Themes:

  • With information about a song, the app will predict its popularity which could possibly help user to increase the songs popularity, which eventually increase song-writers income.
  • By predicting the songs' popularities, the app can help user to choose between songs.

Epic 1: Data Preparation There are several datasets available online about songs and their popularity measured by Billboard rankings. Main datasource come from Billboard package in R, which contains songs' information from 1960 to 2016. https://cran.r-project.org/web/packages/billboard/billboard.pdf

  • Backlog
    • Story 1: Merge databases (4 point)

      • Online datasets searching
      • Merge several datasets to include more features of songs that will be needed for modeling
    • Story 2: EDA (2 point)

      • Explore the potential variables that could be used to better predict the songs' popularities
      • Perform necessary variables transformation

Epic 2: Modeling Build the predicting models such as linear regression, neural networks and etc. Choose the optimal model by the ML metrics.

  • Backlog
    • Story 1: Build initial models (4 points)

      • Build several predicting models with features from the first epic.
      • Use common ML metrics and test dataset to choose the final model.
    • Story 3: Model review with QA partner (4 points)

      • Review the model with QA partner based on the machine learning criteria as well as the user satisfaction
      • Move both datasets and model to AWS server

Epic 3: Web App Building Build the web app to enable users access the model through web interface.

  • Backlog

    • Story 1: UI design of the web app (8 points)
      • Depends on the functionality of the model, the web app should have reasonable user input and output design
  • Icebox

    • Story 2: Beautify (8 points)
      • Beautify the web app if time permits

Plan for Two Weeks

  • Finish the epic 1 and start the initial modeling.

Repo structure

├── README.md                         <- You are here
│
├── app
│   ├── static/                       <- CSS, JS files that remain static 
│   ├── templates/                    <- HTML (or other code) that is templated and changes based on a set of inputs
│   ├── models.py                     <- Creates the data model for the database connected to the Flask app 
│   ├── __init__.py                   <- Initializes the Flask app and database connection
│
├── config                            <- Directory for yaml configuration files for model training, scoring, etc
│   ├── logging/                      <- Configuration files for python loggers
│
├── data                              <- Folder that contains data used or generated. Only the external/ and sample/ subdirectories are tracked by git. 
│   ├── archive/                      <- Place to put archive data is no longer usabled. Not synced with git. 
│   ├── external/                     <- External data sources, will be synced with git
│   ├── sample/                       <- Sample data used for code development and testing, will be synced with git
│
├── docs                              <- A default Sphinx project; see sphinx-doc.org for details.
│
├── figures                           <- Generated graphics and figures to be used in reporting.
│
├── models                            <- Trained model objects (TMOs), model evaluation, and/or model summaries
│   ├── archive                       <- No longer current models. This directory is included in the .gitignore and is not tracked by git
│
├── notebooks
│   ├── wordCloud                     <- Notebooks that generate the word clouds pictures in the html.
│   ├── deliver                       <- Notebooks shared with others. 
│   ├── archive                       <- Develop notebooks no longer being used.
│   ├── template.ipynb                <- Template notebook for analysis with useful imports and helper functions. 

│
├── src                               <- Source data for the project 
│   ├── archive/                      <- No longer current scripts.
│   ├── helpers/                      <- Helper scripts used in main src files 
│   ├── sql/                          <- SQL source code
│   ├── tables.py                     <- Script for creating a (temporary) MySQL database and adding songs to it 
│   ├── download.py                   <- Script for downloading data on S3
│   ├── read_data.py                  <- Script for cleaning and transforming data for use in training and scoring.
│   ├── model.py                      <- Script for training machine learning model(s)
│   ├── evaluate.py                   <- Script for evaluating model performance 
│
├── test                              <- Files necessary for running model tests (see documentation below) 

├── run.py                            <- Simplifies the execution of one or more of the src scripts
├── app                               <- Source data for the project 
│   ├── app.py                        <- Flask wrapper for running the model
├── config.py                         <- Configuration file for Flask app
├── requirements.txt                  <- Python package dependencies 

This project structure was partially influenced by the Cookiecutter Data Science project.

Documentation

  • Open up docs/build/html/index.html to see Sphinx documentation docs.
  • See docs/README.md for keeping docs up to date with additions to the repository.

Running the application on RDS. For the code to run in the local, please switch to the "local branch".

Run all the code in the root folder.

1. Set up environment

Please cd to the SongAnalytics folder first to create the environment and run the following steps.

The requirements.txt file contains the packages required to run the model code. An environment can be set up in two ways. See bottom of README for exploratory data analysis environment setup.

Make sure that the python of the virtual environment has to be locate at /home/ubuntu/miniconda3/.

With virtualenv

pip install virtualenv

virtualenv pennylane --python=python3.7

source pennylane/bin/activate

pip install -r requirements.txt

With conda

conda create -n pennylane python=3.7
conda activate pennylane
pip install -r requirements.txt

2. Initialize the database

If it's the first time running the code, please run the following command in the window

export SQLALCHEMY_DATABASE_URI=“{conn_type}://{user}:{password}@{host}:{port}/{DATABASE_NAME}”

To create the database in the location configured in config.py with one initial song, run:

python download.py --fileName=<FILE>(optional) --output_path=<PATH>

Default is to download all three database that the model will use: lyrics.csv, wiki_hot_100s.csv and spotify_track_data.csv Or user could specify the file to download.

To run the default setting: python src/download.py --config=config/config.yml

Output path is the directory ended with "/" to save the downloaded file(s).

To upload data to the S3 database, run

python src/upload.py --input_file_path=<INPUT> --bucket_name=<BUCKET> --output_file_path=<OUTPUT>

To create the sql database in RDS, run:

python src/tables.py --config=config/config.yml

To run the code with makefile, simply use

make download
make database

If one is running the code locally, database should be created by

python src/tables.py --config=config/config.yml --flag=false

3. Clean data

To clean the data downloaded from the S3 and to build the training and testing data for the model, run:

python src/read_data.py --config=config/config.yml

or with Makefile

make read_data

The result csv will be wrote to the data folder.

4. Build the model

Once the data folder has the train and test data, model could be built by running

python src/model.py --config=config/config.yml

or with Makefile

make model

The model will be saved as a .sav file in the models folder.

5. Evaluate the model

The model evluation could be got from running

python src/evaluate.py --config=config/config.yml

or with Makefile

make evaluate

The model will be saved as a .txt file in the models folder.

6. Unitest

Unitest of the functions for the model could be done running

pytest

or with Makefile

make test

All the code above from environment to unitest could be run by

make all

7. Run the application

python app/app.py

8. Interact with the application

Go to http://18.216.140.226:3000/ to interact with the current version of hte app.

Running the application in Local

Config

For the config.py in the main folder right now, comment out the line 11 to line 25. Use the code from line 28 to line 39. The rest of the running procedure is same as the one in rds.

About

MSiA423 Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published