TrackType

Classifies music into 10 genres using audio features extracted with librosa and an XGBoost classifier trained on the GTZAN dataset.

Supported Genres: Blues, Classical, Country, Disco, Hip-Hop, Jazz, Metal, Pop, Reggae, Rock

Quick Start

git clone https://github.com/<your-username>/tracktype.git
cd tracktype
python3 -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt

The pre-trained model is included in the repo, so you can start using it right away:

Web app (upload any song and see the prediction):

streamlit run app/app.py

CLI (predict from the terminal):

python -m src.predict path/to/song.mp3

========================================
  Predicted Genre : jazz
  Confidence      : 87.3%
========================================

Top predictions:
  1. jazz          87.3%
  2. blues         6.2%
  3. classical     3.1%

Retrain the Model

If you want to reproduce the training or experiment with the model yourself:

# download the GTZAN dataset from Kaggle (requires a free Kaggle account)
python -m src.download_data

# train the model (~91% accuracy, saves artifacts to models/)
python -m src.train

The download script also places one sample audio file per genre into data/sample_audio/ for quick testing.

Project Structure

tracktype/
├── app/
│   └── app.py                 # Streamlit web interface
├── data/                      # Dataset CSVs and sample audio (auto-downloaded)
├── models/                    # Pre-trained model, scaler, label encoder
├── src/
│   ├── config.py              # Paths, feature names, constants
│   ├── download_data.py       # Downloads GTZAN dataset from Kaggle
│   ├── feature_extractor.py   # Extracts audio features with librosa
│   ├── model.py               # Loads model and runs inference
│   ├── predict.py             # CLI prediction tool
│   └── train.py               # Training script
├── requirements.txt
└── README.md

How It Works

The audio file gets split into 3-second segments to match the training data. For each segment, librosa extracts 57 features (chroma, spectral centroid, bandwidth, rolloff, zero-crossing rate, MFCCs, etc.). These features are scaled with MinMaxScaler and fed to an XGBoost classifier. For files with multiple segments, the final genre is decided by majority vote across all segments.

The model was trained on the GTZAN features_3_sec.csv with a 70/30 split and reaches ~91% accuracy.

Design Decisions

The GTZAN dataset comes with two CSVs: features_30_sec.csv (~1,000 rows, one per full clip) and features_3_sec.csv (~10,000 rows, each clip split into 3-second chunks). I initially trained on the 30-sec version and got ~71% accuracy. Switching to the 3-sec version bumped it to ~91% simply because the model gets 10x more training examples from the same data.

That choice created a mismatch problem at inference time though. Users upload full-length songs, not 3-second clips. If you extract features from a whole song and feed that to a model trained on 3-second windows, the feature distributions don't match and predictions are unreliable. So the prediction pipeline splits the input audio into the same 3-second segments, classifies each one independently, and uses majority voting to pick the final genre. This keeps the inference distribution aligned with what the model saw during training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrackType

Quick Start

Retrain the Model

Project Structure

How It Works

Design Decisions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
data		data
models		models
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

vyagh/tracktype

Folders and files

Latest commit

History

Repository files navigation

TrackType

Quick Start

Retrain the Model

Project Structure

How It Works

Design Decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages