Skip to content


Repository files navigation

Audio-Based Event Detection


📖 Notion page


Easy way

conda env create --file environment.yml

Manual way

⚠️ Note: This is not the recommended way. Use the environment.yml file instead to create the environment.

conda create -n eye-audio python=3.10 
conda activate eye-audio
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install easydict 
pip install opencv-python
pip install tqdm
pip install matplotlib
pip install scikit-learn
pip install PyYAML
pip install wandb
pip install hydra-core
pip install pandas
pip install seaborn
pip install librosa
pip install moviepy
pip install tabulate
pip install git+
pip install rich
pip install torch_audiomentations
pip install audiomentations
pip install qpsolvers[open_source_solvers]

Project structure

Create and populate data/video and data/csv folders

mkdir -p data/video
mkdir -p data/csv

# cvut dataset
mkdir -p data/video/cvut
mkdir -p data/csv/cvut
ln -s ~/data/MultiDo/CVUTFD/copy/*.{MP4,MOV,mov,mp4,mts,MTS} data/video/cvut/
ln -s ~/data/MultiDo/CVUTFD/result/*.csv data/csv/cvut/

# eyedea dataset
mkdir -p data/video/eyedea
mkdir -p data/csv/eyedea
ln -s ~/data/MultiDo/videa_prujezdy/*.{MP4,MOV,mov,mp4,mts,MTS} data/video/eyedea/
ln -s ~/data/MultiDo/videa_prujezdy/*.csv data/csv/eyedea/

Preprocess files

Use to generate files in data/audio, data/audio_tensors, data/labels and data/intervals

Example: config/dataset/000_debug.yaml

where config/dataset/dataset.yaml is the path to yaml list with files to be preprocessed

Converting videos by ffmpeg:

ffmpeg -i input_video.mts -c:v copy -c:a aac -b:a 256k output_video.mp4

Wandb account

To visualize training curves, create wandb account and add new project. Add your wandb project name and account name to config/wandb/wandb.yaml.

Neural Network Training

Debug training

The following command will run training for a few epochs and save results to outputs/000_debug folder

python experiment=000_debug cuda=1

Best Model

Change training configurations in config/model/default.yaml

To override run configuration, use:

python experiment=047_october

where 047_october is the name of the experiment defined in config/experiment/047_october.yaml file


Download pretrained model here and unzip in outputs folder


Demo 1


It takes an audio, extracted from a video, applies multi-head audio predictor and outputs predictions for individual time windows and summary.


  1. videos
  2. model


  1. predictions for each time window
  2. counts for each head


python -v 71_Samsung -m 047_october/0

Notice, 71_Samsung video file should be somewhere in subdirectories of data/video/**. The full model path is "outputs/047_october/0/rvce.pth".

Demo 2

Prediction and evaluation. The same as demo_1, but it uses ground-truth labels to evaluate prediction accuracy.


  1. videos
  2. model
  3. csv files with annotations


  1. rvce for each head
  2. fault detection visualization


python -v 71_Samsung -m 047_october/0

Notice, 71_Samsung video file should be somewhere in subdirectories of data/video/** and annotations in data/csv/**. The full model path is "outputs/047_october/0/rvce.pth".

Demo 3

It splits input (long) video into two parts. The begining part is used for fine-tuning the prediction model. The trailing part of the video is used for prediction and evaluation.


  1. videos
  2. model
  3. csv files with annotations
  4. fine-tuning length (training part)


  1. rvce for each head on test part


python -v 71_Samsung -m 047_october/0 --device cpu --training_hours 0.15

Notice, 71_Samsung video file should be somewhere in subdirectories of data/video/** and annotations in data/csv/**. The full model path is "outputs/047_october/0/rvce.pth". The first 0.15 hours of the video is used for training and rest evaluation and the it uses CPU only.