Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
85 lines (49 sloc) 5.33 KB

Animal Vocalization Generative Network (AVGN)

Tim Sainburg (PhD student, UCSD, Gentner Laboratory)

This project is a work in progress, some features are not yet completed.

This is a project for taking animal vocalization audio recordings, and learning a generative model of segments of those vocalizations (e.g. syllables) using modern machine learning techniques. Specifically. This package will take in a dataset of wav files, segment them into units (e.g. syllables of birdsong) and train a generative model on those segments. The learned latent representations can be used to cluster syllables in an unsupervised manner, generate novel syllables, visualize sequences, or perform several other analyses.

Overview of the package


Latent space generative modelling of song

Below is an example of a variational autoencoder trained on birdsong. In each example, points are selected from a low dimensional latent space, and are then passed through a decoder to be decoded into syllable spectrograms. We show in other notebook examples how to invert these spectrograms into waveforms (currently using Griffin and Lim inversion).

an example grid sampling from Bengalese Finch song (from a 2D Multidimensional Scaling Autoencoder)


an example interpolation of Bengalese finch song (from a 16D Variational Autoencoder)


an example interpolation of Cassin's vireo song (from a 16D Variational Autoencoder)


An example of transcribed Bengalese Finch song

Below is an example of a combination of the HDBSCAN and UMAP algorithms, first used to reduce the dimensionality of syllables, then used to cluster syllables into discrete categories.


(left) Distribution of syllables in UMAP dimensionality reduction, labelled using HDBSCAN. Each dot is a syllable from the same finch. (right) The same plot as to the left, replacing syllables with line segments connecting syllables, representing syllable transitions.


The entire sequence dataset from Katahira et al., for the same Bengalese finch as above. Each vertical bar represents one song, and each color represents one syllable.


(top) Syllabic transcriptions of the same bird. (bottom) the same syllables, segmented, normalized, and padded.


Examples of of different songbirds are located in the notebooks/birdsong folder. There is no explicit documentation, but we will work on adding better docstrings to different functions (as we clean them up), and adding more notes to the example notebooks.

Currently there are two example birds - Cassins vireo, and Bengalese finch. The Cassin's vireo example dataset compares hand labelled syllables to syllable labels learned using out method, and thus uses the same segmentations as the manual method. The Bengalese finch is segmented automatically. I'm currently working on adding a few more species (both songbirds and other species).

To use these notebooks on your own dataset, clone this repo and copy the methods from one of the examples. You will need to change the parameters as well as parse date/time information in 1.0-segment-song-from-wavs.ipynb yourself.

The GAIA autoencoder is not currently implemented in AVGN. I have a GAIA specific repo with that implementation, that will probably need some adjustments to work with AVGN. Feel free to try to pull them together and make a PR.

Some of these functions use a lot of RAM (for example loading your whole dataset into RAM). If RAM is an issue for you, try using the data_interator from


to install run python install

Data references

Hedley, Richard (2016): Data used in PLoS One article “Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassin’s Vireo (Vireo cassini)” by Hedley (2016). figshare.

Katahira K, Suzuki K, Kagawa H, Okanoya K (2013) A simple explanation for the evolution of complex song syntax in Bengalese finches. Biology Letters 9(6): 20130842.

Katahira K, Suzuki K, Kagawa H, Okanoya K (2013) Data from: A simple explanation for the evolution of complex song syntax in Bengalese finches. Dryad Digital Repository.

Arriaga, J. G., Cody, M. L., Vallejo, E. E., & Taylor, C. E. (2015). Bird-DB: A database for annotated bird song sequences. Ecological informatics, 27, 21-25.


  • rewrite functions and add docstrings
  • make less RAM heavy
  • add other animal vocalization datasets
  • ...

Project based on the cookiecutter data science project template. #cookiecutterdatascience

You can’t perform that action at this time.