Inconsistent syllable error rate between vak eval and predict #697

zhileiz1992 · 2023-09-17T00:04:38Z

I’ve been using vak to train TweetyNet model on my own vocalization data for the annotation task. I’m a little confused about the results. Seems that the syllable accuracy is not correctly calculated by the vak eval function. Here is one example output from vak eval:
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_acc: 0.85201
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_acc_tfm: 0.85201
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_levenshtein: 45.52941
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_levenshtein_tfm: 45.52941
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_segment_error_rate: 0.78289
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_segment_error_rate_tfm: 0.78289
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_loss: 0.52653

If I understand the results correctly, my model is able to achieve 85% frame accuracy, but the syllable accuracy is pretty bad. However, if I use vak predict to generate predicted labels, the results weren’t that bad. I compared the predicted labels to the ground-truth labels and calculated the levenshtein distance myself using the metrics.Levenshtein() function, the average syllable error is only 26.8%, instead of 78.2% as claimed by the vak eval function. Therefore vak eval and predict seem to give very different syllable error rate on the same dataset and trained model.

Desktop (please complete the following information):

Operating System: Ubuntu 20.04
vak version: 0.8.1

NickleDave · 2023-09-20T22:08:27Z

Hi @zhileiz1992 thank you for raising this issue as we discussed in the forum -- sorry I haven't replied to you sooner.

I think what's going on here is the same issue that was fixed in this commit: 1724d9e

You can see that the default is still a dict in the branch for the version you are using (the "maintenance" branch):

vak/src/vak/config/eval.py

Line 131 in bb410ed

default={}, # empty dict so we can pass into transform with **kwargs expansion

To fix it, I think we just need to change the default from a dict to None for the post_tfm_kwargs attribute of both the EvalConfig and the LearnCurveConfig classes. Basically, the same thing that I did in releasing version 1.0.

For your own work, I think a quick fix for you now would be to upgrade to version 1.0 if you can. Please note that there have been some changes to the config file, see the example in the docs: https://vak.readthedocs.io/en/latest/_downloads/817549abdfad84278f9184d30b61c03d/gy6or6_train.toml

It would be good to also fix this bug in the maintenance branch.

Would you like to contribute that fix?
You'd want to do the following:

make a fork of the repository and set up a development environment with a local clone of your fork, as described here: https://vak.readthedocs.io/en/latest/development/contributors.html#setting-up-a-development-environment
check out a new branch (e.g., git switch -c fix-post-tfm-kwargs-default-fixes-#697), make the change that I made, and then commit it, with a reference to this issue in the commit message, e.g. `git commit -m "BUG: fix default for post_tfm_kwargs, fixes Inconsistent syllable error rate between vak eval and predict #697"
push the branch from your local clone to your fork on GitHub: git push -u origin whatever-branch-name-you-use)
finally, on GitHub make a pull request; importantly you'll want to request that we merge into the version 0.8 maintenance branch, not the default main branch. You'll see a dropdown menu that lets you select the branch when you make the pull request.

Please let me know if you'd like to do that and whether the fix I'm describing makes sense. Happy to help you further if you'd like to contribute but how I explained it wasn't clear. I'm also happy to do the fix myself if you don't want to or can't right now! Thank you again for spotting this bug! 🙏

zhileiz1992 · 2023-09-22T16:30:19Z

Hi @NickleDave

Thank you so much for the detailed instruction!
However, the issue doesn't seem to be resolved.
In vak 0.8.1, I tried changing in the vak codes, the default from a dict to None for the post_tfm_kwargs attribute of both the EvalConfig and the LearnCurveConfig classes. However, I still get incorrect syllable error rate when using vak eval.
2023-09-20 22:47:24,504 - vak.core.eval - INFO - Finished evaluating. Logging computing metrics.
2023-09-20 22:47:24,504 - vak.core.eval - INFO - avg_acc: 0.85903
2023-09-20 22:47:24,504 - vak.core.eval - INFO - avg_levenshtein: 62.76471
2023-09-20 22:47:24,504 - vak.core.eval - INFO - avg_segment_error_rate: 1.08055
2023-09-20 22:47:24,504 - vak.core.eval - INFO - avg_loss: 1.04066

If I calculate the syllable error rate manually on the predicted labels, it's around 0.33.
So seems that the inconsistency is not caused by the post_tfm_kwargs?
If you think it's helpful, I can share my test dataset with you to confirm this? Maybe I made some mistake in using vak.

NickleDave · 2023-09-22T16:51:25Z

Hi @zhileiz1992!

Thank you for taking on the task of changing the code and testing whether it fixes things.

🤔 from the output you provided, it looks like the eval function is no longer computing the metric with a post-processing transformation applied. I think this is the expected behavior. So that part is good

If I calculate the syllable error rate manually on the predicted labels, it's around 0.33

If you're getting a different number when you calculate it manually, that part is not good 🙁

If you think it's helpful, I can share my test dataset with you to confirm this? Maybe I made some mistake in using vak.

Yes, could you please share so I can test?

Please include:
- the directory containing the prepared dataset
- and the directory containing the results from running vak train
- as well as the config files you are using.

I think you shouldn't need to include anything else (e.g. the original source audio + annotations). You might be able to attach here as a zip, if you're comfortable doing that, or you could share by Google Drive or Dropbox (or some other cloud storage service like Box) to my email: nicholdav at gmail dot com

Please if you can also include the code you are using to calculate the syllable error rate manually, e.g. by attaching a Jupyter notebook (the .ipynb file itself, compressed into a .tar.gz or .zip or something so GitHub will let you attach it) or by writing a code snippet in a reply

NickleDave · 2023-09-23T19:15:37Z

I'm updating here that @zhileiz1992 provided data to test by email, and we think we've got to the root of the problem.

The quick version is that the difference we're seeing is expected, because vak predict was run with a config file that applies a post-processing transform, and vak eval was run without those transforms.

But there might still be a bug in version 0.8.x that prevents @zhileiz1992 from running vak eval with the post-processing transforms applied.

@zhileiz1992 please reply with the full traceback of the bug you're getting now when you try to run eval with post_tfm_kwargs, and please attach the config file you are using for eval.

Please also let me know whether you get the error when you use the development version where you made the fix as discussed above, or whether it is instead the version you have installed off of conda or PyPI.

Thanks!

zhileiz1992 · 2023-09-24T17:38:25Z

Thanks @NickleDave for helping figure out the issue!
This is the eval toml I'm using:
[PREP]
data_dir = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval"
output_dir = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval_prepared"
audio_format = "wav"
annot_format = "simple-seq"
labelset = "fshvpqcdamoxebg"

[SPECT_PARAMS]
fft_size = 1024
step_size = 119
transform_type = "log_spect"

[DATALOADER]
window_size = 370

[EVAL]
checkpoint_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
labelmap_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/labelmap.json"
models = "TweetyNet"
batch_size = 1
num_workers = 4
device = "cuda"
output_dir = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022"
csv_path = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval_prepared/JulieMirrorTweetyNet2022_eval_prep_230922_203259.csv"

[TweetyNet.optimizer]
lr = 0.001

If I add the post-processing options 'majority_vote' and 'min_segment_dur', running vak eval will generate this error:
Traceback (most recent call last):
File "/home/zz367/miniconda3/envs/TweetyNet/bin/vak", line 8, in
sys.exit(main())
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/main.py", line 48, in main
cli.cli(command=args.command, config_file=args.configfile)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/cli/cli.py", line 49, in cli
COMMAND_FUNCTION_MAPcommand
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/cli/cli.py", line 3, in eval
eval(toml_path=toml_path)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/cli/eval.py", line 28, in eval
cfg = config.parse.from_toml_path(toml_path)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/config/parse.py", line 200, in from_toml_path
return from_toml(config_toml, toml_path, sections)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/config/parse.py", line 149, in from_toml
are_options_valid(config_toml, section_name, toml_path)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/config/validators.py", line 121, in are_options_valid
raise ValueError(err_msg)
ValueError: the following options from EVAL section in the config file 'eval_JulieMirrorTweetyNet2022.toml' are not valid:
{'majority_vote', 'min_segment_dur'}

NickleDave · 2023-09-25T20:29:13Z

Hi @zhileiz1992 thank you for your quick reply

I am afraid the way we've designed the config files is confusing you

You need to specify those options differently in the EVAL section, using a post_tfm_kwargs option.
Your [EVAL] table in the config file should look like this:

[EVAL]
checkpoint_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
labelmap_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/labelmap.json"
models = "TweetyNet"
batch_size = 1
num_workers = 4
device = "cuda"
output_dir = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022"
csv_path = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval_prepared/JulieMirrorTweetyNet2022_eval_prep_230922_203259.csv"
post_tfm_kwargs = {majority_vote = true, min_segment_dir = 0.2}

Please see this attached file for an example:
TweetyNet_eval_audio_cbin_annot_notmat.zip

Can you please test whether making that change to your file fixes the problem?
If not can you please post what the error is (assuming it changes?)
Can you please also let me know what code you are testing with -- is it with the installed version that originally gave you the error, or is it with the local development install you have where you made the changes described above?

Thanks!
Again I realize it's a bit confusing that you specify these options a different way for [EVAL], that's one thing we need to fix when we re-vamp the config file format as in #345. I am also realizing we need to add examples of this to the tutorial.

Just let me know if that's not clear, happy to answer questions or jump on a zoom call to troubleshoot if needed

NickleDave · 2023-09-28T16:56:16Z

@all-contributors please add @zhileiz1992 for bug

allcontributors · 2023-09-28T16:56:25Z

@NickleDave

I've put up a pull request to add @zhileiz1992! 🎉

zhileiz1992 · 2023-10-01T16:29:41Z

Hi @NickleDave
Sorry for the delayed response! I was caught up in other things last week.
But I just tried adding the line 'post_tfm_kwargs = {majority_vote = true, min_segment_dur = 0.01}' to my eval toml file.
And now the syllable error rate is consistent with my manual calculation!
Thanks a lot for helping me troubleshoot.
2023-10-01 12:26:06,429 - vak.core.eval - INFO - avg_acc: 0.85761
2023-10-01 12:26:06,429 - vak.core.eval - INFO - avg_acc_tfm: 0.86033
2023-10-01 12:26:06,429 - vak.core.eval - INFO - avg_levenshtein: 49.52941
2023-10-01 12:26:06,429 - vak.core.eval - INFO - avg_levenshtein_tfm: 16.76471
2023-10-01 12:26:06,430 - vak.core.eval - INFO - avg_segment_error_rate: 0.83433
2023-10-01 12:26:06,430 - vak.core.eval - INFO - avg_segment_error_rate_tfm: 0.28196
2023-10-01 12:26:06,430 - vak.core.eval - INFO - avg_loss: 0.84197

NickleDave · 2023-10-01T16:33:54Z

🙌 awesome, that's great @zhileiz1992, thank you for letting me know

And thanks also for spotting the bug and helping us see where we can document things better, and make the config file format for consistent.

Like I said, I'd be happy to walk you through contributing the code to fix this, but I also totally understand if you're busy with other things. If that's the case I can finish it now that you've helped me see this will indeed fix the issue for the 0.x maintenance branch. Just let me know

zhileiz1992 · 2023-10-01T16:37:24Z

Yes, I'd love to contribute!
Let me know what steps I need to take @NickleDave

NickleDave · 2023-10-01T17:06:36Z

🚀 excellent!

I think you might have done this already but just in case, you will want to set up a development environment as described here:
https://vak.readthedocs.io/en/latest/development/contributors.html#setting-up-a-development-environment

And then, at a high level, you'll want to commit the changes you made in a feature branch with git, and then submit that as a pull request to the 0.8 branch.

I'm not sure how familiar you are with version control + collab with git and GitHub. If you still need a little help I can definitely point you at some resources, e.g. this video from Data Umbrella, this Carpentries course, this chapter from Research Software Engineering with Python

We could also jump on a Zoom call and I'll walk you through your first contribution.

In more detail, quoting myself from above (not saying that to chide you, again I know you're busy with other things, just copying-pasting myself because I'm lazy 😛 )

You'd want to do the following:

make a fork of the repository (that is, vak) and set up a development environment with a local clone of your fork, as described here: https://vak.readthedocs.io/en/latest/development/contributors.html#setting-up-a-development-environment

using git, check out a new branch (e.g., git switch -c fix-post-tfm-kwargs-default-fixes-#697), make the change that I made, and then commit it, with a reference to this issue in the commit message, e.g. `git commit -m "BUG: fix default for post_tfm_kwargs, fixes Inconsistent syllable error rate between vak eval and predict #697"

push the branch from your local clone to your fork on GitHub: git push -u origin whatever-branch-name-you-use)

finally, on GitHub make a pull request; importantly you'll want to request that we merge into the version 0.8 maintenance branch, not the default main branch. You'll see a dropdown menu that lets you select the branch when you make the pull request.

…or rate between vak eval and predict vocalpy#697

zhileiz1992 · 2023-10-01T17:54:44Z

Hi @NickleDave
Thanks for the detailed instructions!
I just made the changes and created a pull request to the 0.8 branch.
Let me know if it works.

NickleDave · 2023-10-01T22:38:35Z

well that was easy 🙂

🚀 excellent, thank you @zhileiz1992!!!

…://github.com/zhileiz1992/vak into fix-post-tfm-kwargs-default-fixes-vocalpy#697

* BUG: fix default for post_tfm_kwargs, fixes Inconsistent syllable error rate between vak eval and predict #697 --------- Co-authored-by: zz367 <zz367@cornell.edu>

NickleDave · 2023-10-03T19:22:10Z

Closing as fixed by #710 (thank you @zhileiz1992!)

zhileiz1992 added the BUG Something isn't working label Sep 17, 2023

NickleDave self-assigned this Sep 23, 2023

allcontributors bot mentioned this issue Sep 28, 2023

docs: add zhileiz1992 as a contributor for bug #705

Merged

zhileiz1992 pushed a commit to zhileiz1992/vak that referenced this issue Oct 1, 2023

BUG: fix default for post_tfm_kwargs, fixes Inconsistent syllable err…

e95d949

…or rate between vak eval and predict vocalpy#697

zhileiz1992 pushed a commit to zhileiz1992/vak that referenced this issue Oct 2, 2023

BUG: Fix default for post_tfm_kwargs, fixes vocalpy#697

c3a310b

zhileiz1992 pushed a commit to zhileiz1992/vak that referenced this issue Oct 2, 2023

Merge branch 'fix-post-tfm-kwargs-default-fixes-vocalpy#697' of https…

f5a84f0

…://github.com/zhileiz1992/vak into fix-post-tfm-kwargs-default-fixes-vocalpy#697

NickleDave closed this as completed Oct 3, 2023

NickleDave mentioned this issue Oct 3, 2023

ENH: Have vak.predict accept post_tfm_kwargs like eval + learncurve #714

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent syllable error rate between vak eval and predict #697

Inconsistent syllable error rate between vak eval and predict #697

zhileiz1992 commented Sep 17, 2023

NickleDave commented Sep 20, 2023

zhileiz1992 commented Sep 22, 2023

NickleDave commented Sep 22, 2023

NickleDave commented Sep 23, 2023

zhileiz1992 commented Sep 24, 2023

NickleDave commented Sep 25, 2023

NickleDave commented Sep 28, 2023

allcontributors bot commented Sep 28, 2023

zhileiz1992 commented Oct 1, 2023

NickleDave commented Oct 1, 2023 •

edited

Loading

zhileiz1992 commented Oct 1, 2023

NickleDave commented Oct 1, 2023 •

edited

Loading

zhileiz1992 commented Oct 1, 2023

NickleDave commented Oct 1, 2023

NickleDave commented Oct 3, 2023

Inconsistent syllable error rate between vak eval and predict #697

Inconsistent syllable error rate between vak eval and predict #697

Comments

zhileiz1992 commented Sep 17, 2023

NickleDave commented Sep 20, 2023

zhileiz1992 commented Sep 22, 2023

NickleDave commented Sep 22, 2023

NickleDave commented Sep 23, 2023

zhileiz1992 commented Sep 24, 2023

NickleDave commented Sep 25, 2023

NickleDave commented Sep 28, 2023

allcontributors bot commented Sep 28, 2023

zhileiz1992 commented Oct 1, 2023

NickleDave commented Oct 1, 2023 • edited Loading

zhileiz1992 commented Oct 1, 2023

NickleDave commented Oct 1, 2023 • edited Loading

zhileiz1992 commented Oct 1, 2023

NickleDave commented Oct 1, 2023

NickleDave commented Oct 3, 2023

NickleDave commented Oct 1, 2023 •

edited

Loading

NickleDave commented Oct 1, 2023 •

edited

Loading