-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent syllable error rate between vak eval and predict #697
Comments
Hi @zhileiz1992 thank you for raising this issue as we discussed in the forum -- sorry I haven't replied to you sooner. I think what's going on here is the same issue that was fixed in this commit: 1724d9e You can see that the default is still a Line 131 in bb410ed
To fix it, I think we just need to change the default from a For your own work, I think a quick fix for you now would be to upgrade to version 1.0 if you can. Please note that there have been some changes to the config file, see the example in the docs: https://vak.readthedocs.io/en/latest/_downloads/817549abdfad84278f9184d30b61c03d/gy6or6_train.toml It would be good to also fix this bug in the maintenance branch. Would you like to contribute that fix?
Please let me know if you'd like to do that and whether the fix I'm describing makes sense. Happy to help you further if you'd like to contribute but how I explained it wasn't clear. I'm also happy to do the fix myself if you don't want to or can't right now! Thank you again for spotting this bug! 🙏 |
Hi @NickleDave Thank you so much for the detailed instruction! If I calculate the syllable error rate manually on the predicted labels, it's around 0.33. |
Hi @zhileiz1992! Thank you for taking on the task of changing the code and testing whether it fixes things. 🤔 from the output you provided, it looks like the eval function is no longer computing the metric with a post-processing transformation applied. I think this is the expected behavior. So that part is good
If you're getting a different number when you calculate it manually, that part is not good 🙁
Yes, could you please share so I can test?
I think you shouldn't need to include anything else (e.g. the original source audio + annotations). You might be able to attach here as a zip, if you're comfortable doing that, or you could share by Google Drive or Dropbox (or some other cloud storage service like Box) to my email: nicholdav at gmail dot com
|
I'm updating here that @zhileiz1992 provided data to test by email, and we think we've got to the root of the problem. The quick version is that the difference we're seeing is expected, because But there might still be a bug in version 0.8.x that prevents @zhileiz1992 from running @zhileiz1992 please reply with the full traceback of the bug you're getting now when you try to run Please also let me know whether you get the error when you use the development version where you made the fix as discussed above, or whether it is instead the version you have installed off of conda or PyPI. Thanks! |
Thanks @NickleDave for helping figure out the issue! [SPECT_PARAMS] [DATALOADER] [EVAL] [TweetyNet.optimizer] If I add the post-processing options 'majority_vote' and 'min_segment_dur', running vak eval will generate this error: |
Hi @zhileiz1992 thank you for your quick reply I am afraid the way we've designed the config files is confusing you You need to specify those options differently in the EVAL section, using a [EVAL]
checkpoint_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
labelmap_path = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022/results_230922_164204/labelmap.json"
models = "TweetyNet"
batch_size = 1
num_workers = 4
device = "cuda"
output_dir = "/home/zz367/ProjectsU/WarbleAnalysis/TweetyNetResults/JulieMirrorTweetyNet2022"
csv_path = "/home/zz367/ProjectsU/WarbleAnalysis/Data/JulieMirror/Batch1Nov2022/JulieMirrorTweetyNet2022/JulieMirrorTweetyNet2022_eval_prepared/JulieMirrorTweetyNet2022_eval_prep_230922_203259.csv"
post_tfm_kwargs = {majority_vote = true, min_segment_dir = 0.2} Please see this attached file for an example: Can you please test whether making that change to your file fixes the problem? Thanks! Just let me know if that's not clear, happy to answer questions or jump on a zoom call to troubleshoot if needed |
@all-contributors please add @zhileiz1992 for bug |
I've put up a pull request to add @zhileiz1992! 🎉 |
Hi @NickleDave |
🙌 awesome, that's great @zhileiz1992, thank you for letting me know And thanks also for spotting the bug and helping us see where we can document things better, and make the config file format for consistent. Like I said, I'd be happy to walk you through contributing the code to fix this, but I also totally understand if you're busy with other things. If that's the case I can finish it now that you've helped me see this will indeed fix the issue for the 0.x maintenance branch. Just let me know |
Yes, I'd love to contribute! |
🚀 excellent! I think you might have done this already but just in case, you will want to set up a development environment as described here: And then, at a high level, you'll want to commit the changes you made in a feature branch with git, and then submit that as a pull request to the 0.8 branch. I'm not sure how familiar you are with version control + collab with git and GitHub. If you still need a little help I can definitely point you at some resources, e.g. this video from Data Umbrella, this Carpentries course, this chapter from Research Software Engineering with Python We could also jump on a Zoom call and I'll walk you through your first contribution. In more detail, quoting myself from above (not saying that to chide you, again I know you're busy with other things, just copying-pasting myself because I'm lazy 😛 )
|
…or rate between vak eval and predict vocalpy#697
Hi @NickleDave |
well that was easy 🙂 🚀 excellent, thank you @zhileiz1992!!! |
…://github.com/zhileiz1992/vak into fix-post-tfm-kwargs-default-fixes-vocalpy#697
Closing as fixed by #710 (thank you @zhileiz1992!) |
I’ve been using vak to train TweetyNet model on my own vocalization data for the annotation task. I’m a little confused about the results. Seems that the syllable accuracy is not correctly calculated by the vak eval function. Here is one example output from vak eval:
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_acc: 0.85201
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_acc_tfm: 0.85201
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_levenshtein: 45.52941
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_levenshtein_tfm: 45.52941
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_segment_error_rate: 0.78289
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_segment_error_rate_tfm: 0.78289
2023-09-14 19:37:19,402 - vak.core.eval - INFO - avg_loss: 0.52653
If I understand the results correctly, my model is able to achieve 85% frame accuracy, but the syllable accuracy is pretty bad. However, if I use vak predict to generate predicted labels, the results weren’t that bad. I compared the predicted labels to the ground-truth labels and calculated the levenshtein distance myself using the metrics.Levenshtein() function, the average syllable error is only 26.8%, instead of 78.2% as claimed by the vak eval function. Therefore vak eval and predict seem to give very different syllable error rate on the same dataset and trained model.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: