A convolutional neural-network (CNN) for classifying drum instruments (hi-hats, kicks, and snares).
DrumClassifier
is the classifier model used for my DrumTracker
MIDI transcription program built with PyTorch
and TorchAudio
.
Instructions for installation of the drum-tracker
environment are the same as the ones found here.
The classifier is trained on percussion samples that I frequently use in hip-hop production. They are low-fidelity in nature and are what you would typically hear in old-school or boom-bap hip-hop tracks.
In order to do classification that is some-what robust to generalization and accurate, some data processing is required. A flowchart of the pre-processing steps can be found below.
-
The data is first converted from stereo to mono to reduce dimensionality.
-
Then down-sampled from 44100kHz to 16000kHz to decrease memory while still retaining sonic information.
-
Each sample is then segmented into
$\frac{1}{10}$ of a second chunks to synthetically increase the total amount of training data. -
Then each chunk that falls below an average decibel threshold is discarded to remove quiet samples.
-
All data transformed up to this point is now considered
clean
data. -
A
mel-spectrogram
transformation is then applied to the clean data to be used with aPyTorch
dataloader for the model.