AudioSet Tutorial

This repository will contain notebooks and code snippets for working with Google AudioSet, a large-scale dataset of manually annotated audio events.

tutorial_reading_audioset_tfrecords.ipynb explains how to open a TFRecord that contains AudioSet feature vectors (embeddings), video IDs and labels.

About AudioSet

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds.

Instead of raw audio, every example is made available as a sequence of 128-dimension embeddings (one embedding for every second of audio). The embeddings were obtained after passing the log-scaled mel-spectrograms of the audio waveform through a deep convolutional neural network trained on millions of audio clips (VGGish).

Where to find further information

Creators/users of the dataset can be found at audioset-users.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
class_labels_indices.csv		class_labels_indices.csv
tutorial_reading_audioset_tfrecords.ipynb		tutorial_reading_audioset_tfrecords.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

class_labels_indices.csv

class_labels_indices.csv

tutorial_reading_audioset_tfrecords.ipynb

tutorial_reading_audioset_tfrecords.ipynb

Repository files navigation

AudioSet Tutorial

About AudioSet

Where to find further information

About

Releases

Packages

Languages

WenwanChen/audioset-tutorial

Folders and files

Latest commit

History

Repository files navigation

AudioSet Tutorial

About AudioSet

Where to find further information

About

Resources

Stars

Watchers

Forks

Languages