RibbitRadar is a python-based application designed to accurately identify the presence of specific frog species within audio recordings. Leveraging my fine-tuned version of the Audio Spectrogram Transformer (AST), RibbitRadar processes audio data, preforms inference, and generates reports with detailed information on detection.
Ribbit Radar is part of a broader project focused on automated frog call recognition. The application performs the following key tasks:
- Preprocessing: Converts audio files into a format suitable for model inference.
- Inference: Uses pre-trained models to identify frog species in the recordings.
- Reporting: Generates results in various report formats, providing both detailed and summary-level information.
- Features: Adjustable prediction mode, thresholds, and report formatting.
A more detailed flowchart of the application logic is below
- Rana draytonii: Accuracy: 96.52% - Precision: 96.09% - Recall: 91.87%
- Rana catesbeiana: Accuracy: 94.60% - Precision: 95.61% - Recall: 82.43%
Based on a test set of 10-second audio files with a split of 455 rana draytonii, 370 Rana catesbeiana, and 1111 Negative.
To use RibbitRadar, download the latest release from the Releases page. The release includes a packaged application for macOS and Windows.
- macOS or Windows operating system.
- Audio recordings in WAV format to analyze.
- Extract the RibbitRadar.zip file.
- Navigate to the RibbitRadar directory.
- Double-click on
main.exe
to run the application.
If you utilize RibbitRadar in your research, please consider citing the original AST paper and any subsequent works that this project builds upon.
The first paper proposes the Audio Spectrogram Transformer while the second paper describes the training pipeline that they applied on AST to achieve the new state-of-the-art on AudioSet.
@inproceedings{gong21b_interspeech,
author={Yuan Gong and Yu-An Chung and James Glass},
title={{AST: Audio Spectrogram Transformer}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={571--575},
doi={10.21437/Interspeech.2021-698}
}
@ARTICLE{gong_psla,
author={Gong, Yuan and Chung, Yu-An and Glass, James},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation},
year={2021},
doi={10.1109/TASLP.2021.3120633}
}
If you have a question, would like to develop something similar for another species, or just want to share how you have used this, send me an email at tylerschwenk1@yahoo.com.