conda env create --file environment.yml
environment.yml
file instead to create the environment.
conda create -n eye-audio python=3.10
conda activate eye-audio
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install easydict
pip install opencv-python
pip install tqdm
pip install matplotlib
pip install scikit-learn
pip install PyYAML
pip install wandb
pip install hydra-core
pip install pandas
pip install seaborn
pip install librosa
pip install moviepy
pip install tabulate
pip install git+https://github.com/yermandy/pyrootutils.git
pip install rich
pip install torch_audiomentations
pip install audiomentations
pip install qpsolvers[open_source_solvers]
Create and populate data/video
and data/csv
folders
mkdir -p data/video
mkdir -p data/csv
# cvut dataset
mkdir -p data/video/cvut
mkdir -p data/csv/cvut
ln -s ~/data/MultiDo/CVUTFD/copy/*.{MP4,MOV,mov,mp4,mts,MTS} data/video/cvut/
ln -s ~/data/MultiDo/CVUTFD/result/*.csv data/csv/cvut/
# eyedea dataset
mkdir -p data/video/eyedea
mkdir -p data/csv/eyedea
ln -s ~/data/MultiDo/videa_prujezdy/*.{MP4,MOV,mov,mp4,mts,MTS} data/video/eyedea/
ln -s ~/data/MultiDo/videa_prujezdy/*.csv data/csv/eyedea/
Use preprocess_data.py
to generate files in data/audio
, data/audio_tensors
, data/labels
and data/intervals
Example:
preprocess_data.py config/dataset/000_debug.yaml
where config/dataset/dataset.yaml
is the path to yaml list with files to be preprocessed
Converting videos by ffmpeg:
ffmpeg -i input_video.mts -c:v copy -c:a aac -b:a 256k output_video.mp4
To visualize training curves, create wandb account and add new project. Add your wandb project name and account name to config/wandb/wandb.yaml
.
The following command will run training for a few epochs and save results to outputs/000_debug
folder
python cross_validation.py experiment=000_debug cuda=1
Change training configurations in config/model/default.yaml
To override run configuration, use:
python cross_validation.py experiment=047_october
where 047_october
is the name of the experiment defined in config/experiment/047_october.yaml
file
Download pretrained model here and unzip in outputs
folder
Prediction.
It takes an audio, extracted from a video, applies multi-head audio predictor and outputs predictions for individual time windows and summary.
Input:
- videos
- model
Output:
- predictions for each time window
- counts for each head
Usage:
python demo_1.py -v 71_Samsung -m 047_october/0
Notice, 71_Samsung
video file should be somewhere in subdirectories of data/video/**
. The full model path is "outputs/047_october/0/rvce.pth".
Prediction and evaluation. The same as demo_1, but it uses ground-truth labels to evaluate prediction accuracy.
Input:
- videos
- model
- csv files with annotations
Output:
- rvce for each head
- fault detection visualization
Usage:
python demo_2.py -v 71_Samsung -m 047_october/0
Notice, 71_Samsung
video file should be somewhere in subdirectories of data/video/**
and annotations in data/csv/**
. The full model path is "outputs/047_october/0/rvce.pth".
It splits input (long) video into two parts. The begining part is used for fine-tuning the prediction model. The trailing part of the video is used for prediction and evaluation.
Input:
- videos
- model
- csv files with annotations
- fine-tuning length (training part)
Output:
- rvce for each head on test part
Usage:
python demo_3.py -v 71_Samsung -m 047_october/0 --device cpu --training_hours 0.15
Notice, 71_Samsung
video file should be somewhere in subdirectories of data/video/**
and annotations in data/csv/**
. The full model path is "outputs/047_october/0/rvce.pth". The first 0.15 hours of the video is used for training and rest evaluation and the it uses CPU only.