yamnet architecture for audioset label extractor #379

rbroc · 2020-02-17T21:58:02Z

wrapper for https://github.com/tensorflow/models/blob/master/research/audioset/yamnet/yamnet.py extracting label probability for 521 audio events from AudioSet hierarchical ontology.

coveralls · 2020-02-17T22:17:01Z

Coverage decreased (-5.2%) to 73.502% when pulling afbe485 on rbroc:audioset into 7e92ad5 on tyarkoni:master.

…n logic; log_attributes as dicts

rbroc · 2020-02-19T00:14:18Z

@tyarkoni @adelavega the basic logic should be in place and working now, could you take a quick first look to see if you have any major objections?

pliers/extractors/audio.py

adelavega · 2020-02-19T18:07:41Z

pliers/extractors/audio.py

+    def _extract(self, stim):
+
+        data = stim.data
+        self.params['SAMPLE_RATE'] = stim.sampling_rate


Are you sure its OK to do this? You explained to me that it was only change how the spectrogram is created, so if you're positive, sounds good to me. I'm just noting this comment in params.py:

The following hyperparameters (except PATCH_HOP_SECONDS) were used to train YAMNet,
so expect some variability in performance if you change these. The patch hop can
be changed arbitrarily: a smaller hop should give you more patches from the same
clip and possibly better performance at a larger computational cost.

Might be worth testing the same audio clip with two sampling rates

will make sure to test it thoroughly (and look into an issue with output values I just noticed) and call another round of review :)

okay, I've looked into this and I think I'm on top of things.
Tl;dr -> it's okay to have other SAMPLE_RATE than 16000Hz, as a) size of input patches will still be the same; b) results should not be significantly affected. Caveat is that SAMPLE_RATE has to be twice as large as MEL_MAX_HZ. Writing down a lengthy explanation below, mostly as a reminder for myself.

(Why) the model is compatible with different sampling rates

Yamnet is compatible with different sampling rates, although it's been trained with audio sampled at 16000Hz.

The input sampling rate is only relevant in the preprocessing step, where the waveform is first passed through STFT to extract spectrogram, which is then converted into a mel-scale spectrogram.

The sampling rate parameter is used to compute how many samples in the stimulus go into a STFT window, but the size of the window is independent from SAMPLE_RATE and explicitly encoded in a separate parameter. Therefore, different sampling rates do not affect how the waveform is binned in the temporal domain.

What SAMPLE_RATE does implicitly influence, though, is the (number, values and resolution of) frequency bins used for the Fourier transform. Higher SAMPLE_RATE correspond to higher frequency resolution in the spectrogram.

The second pass is transforming the spectrogram into a mel-scale spectrogram. The trick here is that, regardless of how many and which frequency bins you have in the spectrogram, the number of frequency bands in the mel spectrogram is fixed by the parameters MEL_BANDS (= number of mel-frequency bins, set to 64 for training) and its range is determined by MEL_MIN_HZ / MEL_MAX_HZ (set to 125/7500 Hz for training). Input spectrograms of size [n_freq_bins, n_time_bins] will always output [n_mel_bands, n_time_bins], where n_mel_bands is independent of n_freq_bins. That's why different sampling rates would yield still fit model requirements in terms of input size.

Different sampling rates should have no impact on the results

Although the input shape is not affected, having different sampling rates on the same waveform might still yield slightly differences in the spectrograms (as FFT frequency resolution will differ).
But as long as the mel frequency range (MEL_MIN_HZ and MEL_MAX_HZ) is kept close to that used for training these differences shouldn't have any impact.

Which of the params can be changed?

MEL_BANDS has to be 64 as the model expects this in its input

MEL_MIN_HZ and MEL_MAX_HZ can be changed, with the caveat that model performance can actually be affected and that MEL_MAX_HZ has to be less than half of the sampling rate (Nyquist freq).

SAMPLE_RATE can be modified and adapted to actual sampling rate, with the constraint that it has to be at least twice as large as MEL_MAX_HZ.

Re: pliers extractor

Would probably add a warning if SAMPLE_RATE ≠ 16000Hz telling the user that the model was trained on 1600Hz, different sampling rates may have a minimal impact on the results and it could be an idea to resample;

Would also add a pliers error if SAMPLE_RATE < 2 * MEL_MAX_HZ, warning that it may be a bad idea to have lower sampling rate and suggesting to either upsample or change MEL_MAX_HZ to a suitable value, although this may reduce accuracy of the predictions

pliers/extractors/audio.py

adelavega · 2020-02-19T18:17:10Z

Oh, and this works in TF 1 & 2 right?

Co-Authored-By: Alejandro de la Vega <aleph4@gmail.com>

rbroc · 2020-02-20T00:18:12Z

Oh, and this works in TF 1 & 2 right?

Yes, found a workaround for the yamnet versioning issue by calling tf graph explicitly, which works both in tf 1 and 2. Will keep ensuring that it's compatible with both. Also raised an issue on yamnet repo, something might happen on that front too (tensorflow/models#8157).

rbroc · 2020-02-20T01:05:46Z

next to dos:

subset by category name
mixin option or add length warning
add tests
docs on how to download and setup yamnet
maybe migrate to models

…om exception for sampling rate / max mel freq mismatch

rbroc · 2020-03-05T01:48:08Z

We determined this bug is due to the fact that one more onset is being included than should be. This does not occur when creating the data, thus the shorter length.

A hacky workaround is to index the onsets by the length of the data:
        onsets = onsets[:, preds.shape[0]]

there might be a more general timing issue.
Will look into that asap! :)

rbroc · 2020-03-05T23:21:56Z

We determined this bug is due to the fact that one more onset is being included than should be. This does not occur when creating the data, thus the shorter length.

A hacky workaround is to index the onsets by the length of the data:
        onsets = onsets[:, preds.shape[0]]

Onsets/duration issue fixed (the length of the window to which predictions are made is actually slight longer than patch_window_seconds).

tyarkoni

Overall, looks good! I left some minor comments.

pliers/extractors/audio.py

tyarkoni · 2020-03-17T15:38:00Z

pliers/extractors/audio.py

+
+    def __init__(self, hop_size=0.1, top_n=None, labels=None,
+                 weights_path=None, yamnet_path=None, **yamnet_kwargs):
+        if yamnet_path is None:


Is there a reason to allow the user to specify a custom yamnet path? I'm assuming this would be an alternate way to import yamnet... the risk there is that users probably won't want to use this unless they have reason to think something has changed in the yamnet implementation. And if something has changed in the yamnet implementation, there's a good chance it will imply an API change, which could result in things breaking for us. So I guess my inclination would be to remove this check and always point to our own installer.

Agreed, unless we think sometimes our install would fail. But I think that should be considered a bug to fix, so no need to build a workaround yet.

removed the option to specify custom yamnet_path (that was only to avoid downloading extra stuff in the user already has the yamnet or the whole tf repo somewhere else).

Probably what we should ultimately do is settle on a unified way of handling downloaded models/files, so that users can do something like pliers install yamnet --dir=/path/to/put/files from the command line, where the storage location is optional (and would default to what we're currently using). Might be worth opening an issue. But since this is the only model like this right now (all the others have their own packages that handle setup themselves), let's not get sidetracked with that right now. This seems fine.

pliers/extractors/audio.py

tyarkoni · 2020-03-17T15:48:21Z

pliers/support/setup_yamnet.py

+import runpy
+
+DOWNLOAD_PATH = Path.home() / 'pliers_data'
+YAMNET_PATH = DOWNLOAD_PATH / 'models-master' / 'research' / 'audioset' / 'yamnet'


I assume this mirrors the structure in the yamnet repo, but it's a lot of nesting. Unless the yamnet code requires that structure, I'd probably just store this under .pliers/models/yamnet or something like that (note that I believe we already use the convention of storing data in .pliers in other places, which is what many other packages do as well).

I think it's that the whole TF repo is being downloaded. That said, in the download script you could download a subset, or download to a tmp dir, and only copy over what you need to .pliers

It is now so that the download script fetches the TF repo's zip archive (no way, as far as I can see, to only download a subset), extracts it all into a yamnet_temp folder in pliers_data, moves the yamnet part to a flat directory pliers_data/yamnet and deletes yamnet_temp.
We could in principle avoid extracting the whole thing, and just looping through files in the zip archive so to only extract the yamnet ones. That would however add a bunch of lines of code, as we would: 1) have to loop; 2) take all the extracted files and move them to a flat folder, as zipfile keeps the file structure when using extract. And after all, the whole thing is only 125Mb.
Just to make sure though, you are talking about pliers_data (same default folder in user's home directory as text dictionaries are dumped to), right? Not .pliers...

Oh, I was under the impression we're putting stuff in .pliers, but if we're currently putting it in pliers_data, then yeah, putting this there as well makes sense.

pliers/tests/extractors/test_audio_extractors.py

rbroc · 2020-03-18T13:50:27Z

@tyarkoni @adelavega thanks a lot for your comments! The last few commits should address all of them, you're welcome to take a look and make sure it is so.
I've made some slightly more substantial changes to setup_yamnet.py script, so to only keep the relevant folder and delete all the other TF models (see my reply to Tal's comment).

tyarkoni

LGTM! Okay to merge?

rbroc · 2020-03-23T17:02:25Z

LGTM! Okay to merge?

Should be, unless @adelavega has further comments.

adelavega · 2020-03-23T17:32:13Z

Nope, LGTM. Merging!

rbroc added 4 commits February 14, 2020 15:01

add extractor to base and create extractor class

1276175

add imports (in progress

2a4eeef

add extractor logic

d308a6d

fix syntax error

22bfe44

rbroc added 4 commits February 17, 2020 22:41

add extraction

e2c9051

fix label/feature matching

26dad23

added onsets, durations, order; order labels by probability; fix top_…

3b3d334

…n logic; log_attributes as dicts

minor fix

d62e2f1

adelavega requested changes Feb 19, 2020

View reviewed changes

rbroc and others added 5 commits February 19, 2020 17:44

delete blank line

a00f3fa

Co-Authored-By: Alejandro de la Vega <aleph4@gmail.com>

spectrogram as extractor field

80f9e0f

Merge branch 'audioset' of https://github.com/rbroc/pliers into audioset

38f1424

only log yamnet_kwargs

a9249f1

fix replace parameters with kwargs logic

5b250d5

fix init issues

e8e273d

rbroc added 10 commits February 19, 2020 19:14

comment

a8c92cc

fix array flipping, check sampling rate and fix log_attributes

1067380

subset by label list

985ff93

add warnings for changed sample rate and mel freq defaults + add cust…

da7b2ca

…om exception for sampling rate / max mel freq mismatch

Merge branch 'master' into audioset

4fefa35

use predict_on_batch to solve size issue

97c44a3

pMerge branch 'master' into audioset

8e4e7d9

convert output to numpy and remove spectrogram

43354ee

add warning for top_n and non-existing labels; enable custom weights

084cba2

added tests and fixed audio filter

9ffb95a

rbroc added 4 commits March 4, 2020 18:21

add install helpers

081d368

helper func

89b99a5

move attempt to import statement and fix params dict subsetting

08654c0

fix timing issue

93be456

rbroc added 9 commits March 4, 2020 21:11

adapt window size

22f8541

timing edits

ebb869b

check patch_window_seconds argument

524a55b

f

c9e0826

fix param check

57d313e

adapt tests and travis file to new install routine

3ab7465

move to before_script

2d250e2

adapt tests to recent changes

c54ac40

remove conditional statement

35c55d5

rbroc added 3 commits March 5, 2020 17:48

test onset/duration mismatch in all extractors

5b62038

edit assert

c4adfa3

add verify_dependencies to tests and revert to_df

3011fb6

tyarkoni reviewed Mar 17, 2020

View reviewed changes

rbroc added 6 commits March 18, 2020 08:52

move tensorflow import ahead of yamnet

6409b6d

list index generator to list

6369f66

adapt install script

fa5e446

remove yamnet_path from tests

91e6b37

fix mkdir

ad21f04

add newline to setup_yamnet log

afbe485

tyarkoni approved these changes Mar 23, 2020

View reviewed changes

adelavega merged commit 5dedbe7 into PsychoinformaticsLab:master Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yamnet architecture for audioset label extractor #379

yamnet architecture for audioset label extractor #379

rbroc commented Feb 17, 2020 •

edited

Loading

coveralls commented Feb 17, 2020 •

edited

Loading

rbroc commented Feb 19, 2020 •

edited

Loading

adelavega Feb 19, 2020

adelavega Feb 19, 2020

rbroc Feb 20, 2020

rbroc Feb 20, 2020 •

edited

Loading

adelavega commented Feb 19, 2020

rbroc commented Feb 20, 2020 •

edited

Loading

rbroc commented Feb 20, 2020 •

edited

Loading

rbroc commented Mar 5, 2020

rbroc commented Mar 5, 2020 •

edited

Loading

tyarkoni left a comment

tyarkoni Mar 17, 2020

adelavega Mar 17, 2020

rbroc Mar 18, 2020

tyarkoni Mar 23, 2020

tyarkoni Mar 17, 2020

adelavega Mar 17, 2020

rbroc Mar 18, 2020

tyarkoni Mar 23, 2020

rbroc commented Mar 18, 2020

tyarkoni left a comment

rbroc commented Mar 23, 2020

adelavega commented Mar 23, 2020

yamnet architecture for audioset label extractor #379

yamnet architecture for audioset label extractor #379

Conversation

rbroc commented Feb 17, 2020 • edited Loading

coveralls commented Feb 17, 2020 • edited Loading

rbroc commented Feb 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbroc Feb 20, 2020 • edited Loading

Choose a reason for hiding this comment

(Why) the model is compatible with different sampling rates

Different sampling rates should have no impact on the results

Which of the params can be changed?

Re: pliers extractor

adelavega commented Feb 19, 2020

rbroc commented Feb 20, 2020 • edited Loading

rbroc commented Feb 20, 2020 • edited Loading

rbroc commented Mar 5, 2020

rbroc commented Mar 5, 2020 • edited Loading

tyarkoni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbroc commented Mar 18, 2020

tyarkoni left a comment

Choose a reason for hiding this comment

rbroc commented Mar 23, 2020

adelavega commented Mar 23, 2020

rbroc commented Feb 17, 2020 •

edited

Loading

coveralls commented Feb 17, 2020 •

edited

Loading

rbroc commented Feb 19, 2020 •

edited

Loading

rbroc Feb 20, 2020 •

edited

Loading

rbroc commented Feb 20, 2020 •

edited

Loading

rbroc commented Feb 20, 2020 •

edited

Loading

rbroc commented Mar 5, 2020 •

edited

Loading