Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent syllable error rate between vak eval and predict #697

Closed
zhileiz1992 opened this issue Sep 17, 2023 · 15 comments
Closed

Inconsistent syllable error rate between vak eval and predict #697

zhileiz1992 opened this issue Sep 17, 2023 · 15 comments
Assignees
Labels
BUG Something isn't working
Projects

Comments

@zhileiz1992
Copy link

I’ve been using vak to train TweetyNet model on my own vocalization data for the annotation task. I’m a little confused about the results. Seems that the syllable accuracy is not correctly calculated by the vak eval function. Here is one example output from vak eval:
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_acc: 0.85201
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_acc_tfm: 0.85201
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_levenshtein: 45.52941
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_levenshtein_tfm: 45.52941
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_segment_error_rate: 0.78289
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_segment_error_rate_tfm: 0.78289
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_loss: 0.52653

If I understand the results correctly, my model is able to achieve 85% frame accuracy, but the syllable accuracy is pretty bad. However, if I use vak predict to generate predicted labels, the results weren’t that bad. I compared the predicted labels to the ground-truth labels and calculated the levenshtein distance myself using the metrics.Levenshtein() function, the average syllable error is only 26.8%, instead of 78.2% as claimed by the vak eval function. Therefore vak eval and predict seem to give very different syllable error rate on the same dataset and trained model.

Desktop (please complete the following information):

  • Operating System: Ubuntu 20.04
  • vak version: 0.8.1
@zhileiz1992 zhileiz1992 added the BUG Something isn't working label Sep 17, 2023
@NickleDave
Copy link
Collaborator

Hi @zhileiz1992 thank you for raising this issue as we discussed in the forum -- sorry I haven't replied to you sooner.

I think what's going on here is the same issue that was fixed in this commit: 1724d9e

You can see that the default is still a dict in the branch for the version you are using (the "maintenance" branch):

default={}, # empty dict so we can pass into transform with **kwargs expansion

To fix it, I think we just need to change the default from a dict to None for the post_tfm_kwargs attribute of both the EvalConfig and the LearnCurveConfig classes. Basically, the same thing that I did in releasing version 1.0.

For your own work, I think a quick fix for you now would be to upgrade to version 1.0 if you can. Please note that there have been some changes to the config file, see the example in the docs: https://vak.readthedocs.io/en/latest/_downloads/817549abdfad84278f9184d30b61c03d/gy6or6_train.toml

It would be good to also fix this bug in the maintenance branch.

Would you like to contribute that fix?
You'd want to do the following:

Please let me know if you'd like to do that and whether the fix I'm describing makes sense. Happy to help you further if you'd like to contribute but how I explained it wasn't clear. I'm also happy to do the fix myself if you don't want to or can't right now! Thank you again for spotting this bug! 🙏

@NickleDave NickleDave added this to To Do in BUG/MAINT Sep 20, 2023
@zhileiz1992
Copy link
Author

Hi @NickleDave

Thank you so much for the detailed instruction!
However, the issue doesn't seem to be resolved.
In vak 0.8.1, I tried changing in the vak codes, the default from a dict to None for the post_tfm_kwargs attribute of both the EvalConfig and the LearnCurveConfig classes. However, I still get incorrect syllable error rate when using vak eval.
2023-09-20 22:47:24,504 - vak.core.eval - INFO - Finished evaluating. Logging computing metrics.
2023-09-20 22:47:24,504 - vak.core.eval - INFO - avg_acc: 0.85903
2023-09-20 22:47:24,504 - vak.core.eval - INFO - avg_levenshtein: 62.76471
2023-09-20 22:47:24,504 - vak.core.eval - INFO - avg_segment_error_rate: 1.08055
2023-09-20 22:47:24,504 - vak.core.eval - INFO - avg_loss: 1.04066

If I calculate the syllable error rate manually on the predicted labels, it's around 0.33.
So seems that the inconsistency is not caused by the post_tfm_kwargs?
If you think it's helpful, I can share my test dataset with you to confirm this? Maybe I made some mistake in using vak.

@NickleDave
Copy link
Collaborator

Hi @zhileiz1992!

Thank you for taking on the task of changing the code and testing whether it fixes things.

🤔 from the output you provided, it looks like the eval function is no longer computing the metric with a post-processing transformation applied. I think this is the expected behavior. So that part is good

If I calculate the syllable error rate manually on the predicted labels, it's around 0.33

If you're getting a different number when you calculate it manually, that part is not good 🙁

If you think it's helpful, I can share my test dataset with you to confirm this? Maybe I made some mistake in using vak.

Yes, could you please share so I can test?

  • Please include:
    • the directory containing the prepared dataset
    • and the directory containing the results from running vak train
    • as well as the config files you are using.

I think you shouldn't need to include anything else (e.g. the original source audio + annotations). You might be able to attach here as a zip, if you're comfortable doing that, or you could share by Google Drive or Dropbox (or some other cloud storage service like Box) to my email: nicholdav at gmail dot com

  • Please if you can also include the code you are using to calculate the syllable error rate manually, e.g. by attaching a Jupyter notebook (the .ipynb file itself, compressed into a .tar.gz or .zip or something so GitHub will let you attach it) or by writing a code snippet in a reply

@NickleDave NickleDave self-assigned this Sep 23, 2023
@NickleDave
Copy link
Collaborator

I'm updating here that @zhileiz1992 provided data to test by email, and we think we've got to the root of the problem.

The quick version is that the difference we're seeing is expected, because vak predict was run with a config file that applies a post-processing transform, and vak eval was run without those transforms.

But there might still be a bug in version 0.8.x that prevents @zhileiz1992 from running vak eval with the post-processing transforms applied.

@zhileiz1992 please reply with the full traceback of the bug you're getting now when you try to run eval with post_tfm_kwargs, and please attach the config file you are using for eval.

Please also let me know whether you get the error when you use the development version where you made the fix as discussed above, or whether it is instead the version you have installed off of conda or PyPI.

Thanks!

@zhileiz1992
Copy link
Author

Thanks @NickleDave for helping figure out the issue!
This is the eval toml I'm using:
[PREP]
data_dir = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval"
output_dir = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval_prepared"
audio_format = "wav"
annot_format = "simple-seq"
labelset = "fshvpqcdamoxebg"

[SPECT_PARAMS]
fft_size = 1024
step_size = 119
transform_type = "log_spect"

[DATALOADER]
window_size = 370

[EVAL]
checkpoint_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
labelmap_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/labelmap.json"
models = "TweetyNet"
batch_size = 1
num_workers = 4
device = "cuda"
output_dir = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022"
csv_path = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval_prepared/JulieMirrorTweetyNet2022_eval_prep_230922_203259.csv"

[TweetyNet.optimizer]
lr = 0.001

If I add the post-processing options 'majority_vote' and 'min_segment_dur', running vak eval will generate this error:
Traceback (most recent call last):
File "/home/zz367/miniconda3/envs/TweetyNet/bin/vak", line 8, in
sys.exit(main())
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/main.py", line 48, in main
cli.cli(command=args.command, config_file=args.configfile)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/cli/cli.py", line 49, in cli
COMMAND_FUNCTION_MAPcommand
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/cli/cli.py", line 3, in eval
eval(toml_path=toml_path)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/cli/eval.py", line 28, in eval
cfg = config.parse.from_toml_path(toml_path)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/config/parse.py", line 200, in from_toml_path
return from_toml(config_toml, toml_path, sections)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/config/parse.py", line 149, in from_toml
are_options_valid(config_toml, section_name, toml_path)
File "/home/zz367/miniconda3/envs/TweetyNet/lib/python3.8/site-packages/vak/config/validators.py", line 121, in are_options_valid
raise ValueError(err_msg)
ValueError: the following options from EVAL section in the config file 'eval_JulieMirrorTweetyNet2022.toml' are not valid:
{'majority_vote', 'min_segment_dur'}

@NickleDave
Copy link
Collaborator

Hi @zhileiz1992 thank you for your quick reply

I am afraid the way we've designed the config files is confusing you

You need to specify those options differently in the EVAL section, using a post_tfm_kwargs option.
Your [EVAL] table in the config file should look like this:

[EVAL]
checkpoint_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
labelmap_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/labelmap.json"
models = "TweetyNet"
batch_size = 1
num_workers = 4
device = "cuda"
output_dir = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022"
csv_path = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval_prepared/JulieMirrorTweetyNet2022_eval_prep_230922_203259.csv"
post_tfm_kwargs = {majority_vote = true, min_segment_dir = 0.2}

Please see this attached file for an example:
TweetyNet_eval_audio_cbin_annot_notmat.zip

Can you please test whether making that change to your file fixes the problem?
If not can you please post what the error is (assuming it changes?)
Can you please also let me know what code you are testing with -- is it with the installed version that originally gave you the error, or is it with the local development install you have where you made the changes described above?

Thanks!
Again I realize it's a bit confusing that you specify these options a different way for [EVAL], that's one thing we need to fix when we re-vamp the config file format as in #345. I am also realizing we need to add examples of this to the tutorial.

Just let me know if that's not clear, happy to answer questions or jump on a zoom call to troubleshoot if needed

@NickleDave
Copy link
Collaborator

@all-contributors please add @zhileiz1992 for bug

@allcontributors
Copy link
Contributor

@NickleDave

I've put up a pull request to add @zhileiz1992! 🎉

@zhileiz1992
Copy link
Author

Hi @NickleDave
Sorry for the delayed response! I was caught up in other things last week.
But I just tried adding the line 'post_tfm_kwargs = {majority_vote = true, min_segment_dur = 0.01}' to my eval toml file.
And now the syllable error rate is consistent with my manual calculation!
Thanks a lot for helping me troubleshoot.
2023-10-01 12:26:06,429 - vak.core.eval - INFO - avg_acc: 0.85761
2023-10-01 12:26:06,429 - vak.core.eval - INFO - avg_acc_tfm: 0.86033
2023-10-01 12:26:06,429 - vak.core.eval - INFO - avg_levenshtein: 49.52941
2023-10-01 12:26:06,429 - vak.core.eval - INFO - avg_levenshtein_tfm: 16.76471
2023-10-01 12:26:06,430 - vak.core.eval - INFO - avg_segment_error_rate: 0.83433
2023-10-01 12:26:06,430 - vak.core.eval - INFO - avg_segment_error_rate_tfm: 0.28196
2023-10-01 12:26:06,430 - vak.core.eval - INFO - avg_loss: 0.84197

@NickleDave
Copy link
Collaborator

NickleDave commented Oct 1, 2023

🙌 awesome, that's great @zhileiz1992, thank you for letting me know

And thanks also for spotting the bug and helping us see where we can document things better, and make the config file format for consistent.

Like I said, I'd be happy to walk you through contributing the code to fix this, but I also totally understand if you're busy with other things. If that's the case I can finish it now that you've helped me see this will indeed fix the issue for the 0.x maintenance branch. Just let me know

@zhileiz1992
Copy link
Author

Yes, I'd love to contribute!
Let me know what steps I need to take @NickleDave

@NickleDave
Copy link
Collaborator

NickleDave commented Oct 1, 2023

🚀 excellent!

I think you might have done this already but just in case, you will want to set up a development environment as described here:
https://vak.readthedocs.io/en/latest/development/contributors.html#setting-up-a-development-environment

And then, at a high level, you'll want to commit the changes you made in a feature branch with git, and then submit that as a pull request to the 0.8 branch.

I'm not sure how familiar you are with version control + collab with git and GitHub. If you still need a little help I can definitely point you at some resources, e.g. this video from Data Umbrella, this Carpentries course, this chapter from Research Software Engineering with Python

We could also jump on a Zoom call and I'll walk you through your first contribution.

In more detail, quoting myself from above (not saying that to chide you, again I know you're busy with other things, just copying-pasting myself because I'm lazy 😛 )

You'd want to do the following:

zhileiz1992 pushed a commit to zhileiz1992/vak that referenced this issue Oct 1, 2023
@zhileiz1992
Copy link
Author

Hi @NickleDave
Thanks for the detailed instructions!
I just made the changes and created a pull request to the 0.8 branch.
Let me know if it works.

@NickleDave
Copy link
Collaborator

well that was easy 🙂

🚀 excellent, thank you @zhileiz1992!!!

zhileiz1992 pushed a commit to zhileiz1992/vak that referenced this issue Oct 2, 2023
zhileiz1992 pushed a commit to zhileiz1992/vak that referenced this issue Oct 2, 2023
NickleDave pushed a commit that referenced this issue Oct 2, 2023
* BUG: fix default for post_tfm_kwargs, fixes Inconsistent syllable error rate between vak eval and predict #697

---------

Co-authored-by: zz367 <zz367@cornell.edu>
@NickleDave
Copy link
Collaborator

Closing as fixed by #710 (thank you @zhileiz1992!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Something isn't working
Projects
Development

No branches or pull requests

2 participants