Surrey CVSSP DCASE 2018 Task 2 System
This is the source code for CVSSP's system used in DCASE 2018 Task 2.
The accompanying technical report can be found here.
This software requires Python 3. To install the dependencies, run:
pip install -r requirements.txt
The main functionality of this software also requires the DCASE 2018 Task 2
datasets, which may be downloaded here. After acquiring the datasets,
_root_dataset_path = ('/path/to/datasets') """str: Path to root directory containing input audio clips.""" training_set = Dataset( name='training', path=os.path.join(_root_dataset_path, 'audio_train'), metadata_path='metadata/training.csv', ) """Dataset instance for the training dataset."""
You will also want to change the work path in
work_path = '/path/to/workspace' """str: Path to parent directory containing program output."""
In this section, the various commands are described. Using this software, the user is able to apply preprocessing (silence removal), extract feature vectors, train the network, generate predictions, and evaluate the predictions.
Our implementation of preprocessing involves extracting the non-silent sections of audio clips and saving these to disk separately. A new metadata file is then created with entries corresponding to the new files.
To apply preprocessing, run:
python task2/main.py preprocess <training/test>
task2/silence.py for the relevant code.
To extract feature vectors, run:
python task2/main.py extract <training/test> [--recompute]
--recompute is enabled, the program will recompute existing feature
vectors. This implementaion extracts log-mel spectrogram features. See
task2/config/logmel.py for tweaking the parameters.
To train a model, run:
python task2/main.py train [--model MODEL] [--fold n] [--sample_weight x] [--class_weight]
--model option can be one of the following:
The training set is assumed to be split into several folds, so the
option specifies which one to use as the validation set. If set to
program trains on the entire dataset. The
--sample_weight option allows
setting a sample weight to be used for unverified (noisy) examples. Finally,
--class_weight flag indicates that examples should be weighted
based on the class that they belong to.
task2/config/training.py for tweaking the parameters or
task2/training.py for further modifications.
To generate predictions, run:
python task2/main.py predict <training/test> [--fold n]
--fold option specifies which fold-specific model to use.
task2/config/predictions.py to modify which epochs are selected for
generating the predictions. By default, the top four models based on their MAP
score on the validation set are chosen.
To evaluate the predictions, run:
python task2/main.py evaluate <fold>
Stacking is an ensembling technique that involves creating meta-features based on the predictions of a number of base classifiers. These meta-features are then used to train a second-level classifier and generate new predictions. We provide scripts to do this.
To generate meta-features, run:
python scripts/meta_features.py <pred_path> <pred_type> <output_path>
pred_path refers to the parent directory in which the
predictions of the base classifiers are stored.
pred_type must be either
test, depending on which dataset the meta-features are for.
output_path specifies the path of the output HDF5 file.
To give an example, assume that the directory structure looks like this:
workspace ├── predictions │ ├── classifier1 │ ├── classifier2 │ ├── classifier3
In this case, you might run:
python scripts/meta_features.py workspace/predictions training training.h5 python scripts/meta_features.py workspace/predictions test test.h5
For the time being, the script must be edited to select the classifiers.
To then generate predictions using a second-level classifier, run:
python scripts/predict_stack.py --test_path test.h5 training.h5 <metadata_path> <output_path>
metadata_path is the path to the training set metadata file.
See the script itself for more details.
To relabel or promote training examples, run:
python scripts/relabel.py <metadata_path> <pred_path> <output_path> [--relabel_threshold t1] [--promote_threshold t2]
metadata_path is the path to the training set metadata file
containing the original labels.
pred_path is the path to the predictions
file used for pseudo-labeling.
output_path is the path of the new metadata
file to be written. The threshold options allow constraining which examples are
relabeled or promoted.