Surrey CVSSP DCASE 2018 Task 2 system
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Fix Sphinx files Aug 8, 2018
metadata Initial commit Aug 3, 2018
scripts Add script for pseudo-labeling Aug 10, 2018
task2 Fix typo in Oct 19, 2018
.flake8 Initial commit Aug 3, 2018
.gitignore Initial commit Aug 3, 2018
LICENSE Initial commit Aug 3, 2018
Pipfile Refactor Pipfile by ordering dependencies Nov 6, 2018
Pipfile.lock Update Pipfile.lock and requirements.txt Nov 6, 2018
README.rst Update README Nov 6, 2018
requirements.txt Update Pipfile.lock and requirements.txt Nov 6, 2018


Surrey CVSSP DCASE 2018 Task 2 System

This is the source code for CVSSP's system used in DCASE 2018 Task 2.

The accompanying technical report can be found here.


This software requires Python 3. To install the dependencies, run:

pipenv install


pip install -r requirements.txt

The main functionality of this software also requires the DCASE 2018 Task 2 datasets, which may be downloaded here. After acquiring the datasets, modify task2/config/ accordingly.

For example:

_root_dataset_path = ('/path/to/datasets')
"""str: Path to root directory containing input audio clips."""

training_set = Dataset(
    path=os.path.join(_root_dataset_path, 'audio_train'),
"""Dataset instance for the training dataset."""

You will also want to change the work path in task2/config/

work_path = '/path/to/workspace'
"""str: Path to parent directory containing program output."""


In this section, the various commands are described. Using this software, the user is able to apply preprocessing (silence removal), extract feature vectors, train the network, generate predictions, and evaluate the predictions.


Our implementation of preprocessing involves extracting the non-silent sections of audio clips and saving these to disk separately. A new metadata file is then created with entries corresponding to the new files.

To apply preprocessing, run:

python task2/ preprocess <training/test>

Refer to task2/ for the relevant code.

Feature Extraction

To extract feature vectors, run:

python task2/ extract <training/test> [--recompute]

If --recompute is enabled, the program will recompute existing feature vectors. This implementaion extracts log-mel spectrogram features. See task2/config/ for tweaking the parameters.


To train a model, run:

python task2/ train [--model MODEL] [--fold n] [--sample_weight x] [--class_weight]

The --model option can be one of the following:

  • vgg13
  • gcnn
  • crnn
  • gcrnn

The training set is assumed to be split into several folds, so the --fold option specifies which one to use as the validation set. If set to -1, the program trains on the entire dataset. The --sample_weight option allows setting a sample weight to be used for unverified (noisy) examples. Finally, setting the --class_weight flag indicates that examples should be weighted based on the class that they belong to.

See task2/config/ for tweaking the parameters or task2/ for further modifications.


To generate predictions, run:

python task2/ predict <training/test> [--fold n]

The --fold option specifies which fold-specific model to use.

See task2/config/ to modify which epochs are selected for generating the predictions. By default, the top four models based on their MAP score on the validation set are chosen.


To evaluate the predictions, run:

python task2/ evaluate <fold>


Stacking is an ensembling technique that involves creating meta-features based on the predictions of a number of base classifiers. These meta-features are then used to train a second-level classifier and generate new predictions. We provide scripts to do this.

To generate meta-features, run:

python scripts/ <pred_path> <pred_type> <output_path>

The argument pred_path refers to the parent directory in which the predictions of the base classifiers are stored. pred_type must be either training or test, depending on which dataset the meta-features are for. output_path specifies the path of the output HDF5 file.

To give an example, assume that the directory structure looks like this:

├── predictions
│   ├── classifier1
│   ├── classifier2
│   ├── classifier3

In this case, you might run:

python scripts/ workspace/predictions training training.h5
python scripts/ workspace/predictions test test.h5

For the time being, the script must be edited to select the classifiers.

To then generate predictions using a second-level classifier, run:

python scripts/ --test_path test.h5 training.h5 <metadata_path> <output_path>

The argument metadata_path is the path to the training set metadata file. See the script itself for more details.


To relabel or promote training examples, run:

python scripts/ <metadata_path> <pred_path> <output_path> [--relabel_threshold t1] [--promote_threshold t2]

The argument metadata_path is the path to the training set metadata file containing the original labels. pred_path is the path to the predictions file used for pseudo-labeling. output_path is the path of the new metadata file to be written. The threshold options allow constraining which examples are relabeled or promoted.