Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: predict empty file handling #634

Closed
nhoglen opened this issue Feb 25, 2023 · 4 comments
Closed

BUG: predict empty file handling #634

nhoglen opened this issue Feb 25, 2023 · 4 comments
Assignees
Labels
BUG Something isn't working

Comments

@nhoglen
Copy link

nhoglen commented Feb 25, 2023

Describe the bug
predict breaks when converting segments to crowsetta seq if no segments were found in the file.

To Reproduce

vak prep check_this_recording.toml runs fine.

vak predict check_this_recording.toml runs until vak.core.predict - INFO - converting predictions to annotations then throws this error:

Traceback (most recent call last):
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\Scripts\vak-script.py", line 9, in <module>
    sys.exit(main())
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\__main__.py", line 48, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\cli\cli.py", line 49, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\cli\cli.py", line 18, in predict
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\cli\predict.py", line 50, in predict
    core.predict(
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\vak\core\predict.py", line 244, in predict
    seq = crowsetta.Sequence.from_keyword(
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\crowsetta\sequence.py", line 382, in from_keyword
    labels) = cls._validate_onsets_offsets_labels(onsets_s,
  File "C:\Users\Nerissa\anaconda3\envs\vak-env4\lib\site-packages\crowsetta\sequence.py", line 224, in _validate_onsets_offsets_labels
    raise ValueError('must provide either onset_inds and offset_inds, or '
ValueError: must provide either onset_inds and offset_inds, or onsets_s and offsets_s

I did a little bit of digging and determined that audio segments without any calls are tripping up the crowsetta conversion process. Specifically, at line 239 of the vak predict.py file, there is a check for empty segments that is going awry because labels is turning up an empty list rather than a None object.

if labels is None and onsets_s is None and offsets_s is None:
                # handle the case when all time bins are predicted to be unlabeled
                # see https://github.com/NickleDave/vak/issues/383
                continue
seq = crowsetta.Sequence.from_keyword(
                labels=labels, onsets_s=onsets_s, offsets_s=offsets_s
            )

Inspecting the variables at this step shows that onset_s and offset_s are None as expected, but not labels.

(Pdb) print(len(labels))
0
(Pdb) print(len(onsets_s))
*** TypeError: object of type 'NoneType' has no len()
(Pdb) print(len(offsets_s))
*** TypeError: object of type 'NoneType' has no len()

Expected behavior
In my previous vak install (version 0.4.0, I think), this ran to completion and generated a full set of predictions.

Desktop (please complete the following information):

  • Operating System: Windows 10
  • Versions: vak 0.8.0, TweetyNet 0.9.0, crowsetta 3.4.1

Additional context
I will email the files to reproduce; the zip is too big to upload here.

@nhoglen nhoglen added the BUG Something isn't working label Feb 25, 2023
@NickleDave NickleDave self-assigned this Feb 25, 2023
@NickleDave
Copy link
Collaborator

NickleDave commented Feb 26, 2023

Sorry you had this issue @nhoglen and thank you for this very thorough bug report.

I'll know for sure when I test with the data but my hunch is you have correctly identified the issue; we do not correctly handle the case where there are no predicted segments because we don't look for the empty list

This was supposed to be fixed by #394 but clearly it wasn't!

Could you please share the data some other way? I wasn't able to open the sharepoint link. Will ask by email too but wanted to reply here (and thank you profusely for making this easy for me to fix 😼 😁 )

edit: Box link you sent worked, nvm!

second edit: just to cross-link, this was originally posted on the forum: https://forum.vocalpy.org/t/error-in-last-step-of-generating-predictions/54

@NickleDave
Copy link
Collaborator

NickleDave commented Feb 26, 2023

🤔 I'm not able to replicate this bug -- vak predict runs fine for me on Linux using the files you provided.

Could you please reply with the following files attached?

  • the .toml configuration file you're using
  • the output of running the following command in your conda environment: conda env export > environment.yml
  • as well as conda list --explicit > spec-file.txt (also in the conda environment you are using when you get this bug)

I'm attaching the .toml file I used as well as the csv of annotations that was produced, in a .tar

predict-results.tar.gz

fwiw I have the same versions of vak, tweetynet, and crowsetta installed. I can try making an env on Windows with conda files just in case for some reason the platform is the source of the issue.

Thank you! 🙏

@nhoglen
Copy link
Author

nhoglen commented Feb 28, 2023

Oops, sorry I missed this -- I'm not used to looking out for things on GitHub.

I have found that getting the environment set up right on Windows takes some finagling, so I would not be surprised if there are weird version issues baked in there.

The attached zip has the listed files. environment-etc.zip

I'm happy to keep poking at this and will keep a closer eye on GitHub/figure out how to get it to send more notifications.

@NickleDave
Copy link
Collaborator

Thank you @nhoglen! That helped.
It was an option in the TOML config file that I was missing, the min_segment_dur.
Once I added that to the config I had, I was able to replicate your error.

I now also get the same exception that you first reported on the forum:

Traceback (most recent call last):
  File "/home/pimienta/miniconda3/envs/vak8-env/bin/vak", line 10, in <module>
    sys.exit(main())
  File "/home/pimienta/miniconda3/envs/vak8-env/lib/python3.8/site-packages/vak/__main__.py", line 48, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "/home/pimienta/miniconda3/envs/vak8-env/lib/python3.8/site-packages/vak/cli/cli.py", line 49, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "/home/pimienta/miniconda3/envs/vak8-env/lib/python3.8/site-packages/vak/cli/cli.py", line 18, in predict
    predict(toml_path=toml_path)
  File "/home/pimienta/miniconda3/envs/vak8-env/lib/python3.8/site-packages/vak/cli/predict.py", line 50, in predict
    core.predict(
  File "/home/pimienta/miniconda3/envs/vak8-env/lib/python3.8/site-packages/vak/core/predict.py", line 243, in predict
    seq = crowsetta.Sequence.from_keyword(
  File "/home/pimienta/miniconda3/envs/vak8-env/lib/python3.8/site-packages/crowsetta/sequence.py", line 382, in from_keyword
    labels) = cls._validate_onsets_offsets_labels(onsets_s,
  File "/home/pimienta/miniconda3/envs/vak8-env/lib/python3.8/site-packages/crowsetta/sequence.py", line 224, in _validate_onsets_offsets_labels
    raise ValueError('must provide either onset_inds and offset_inds, or '
ValueError: must provide either onset_inds and offset_inds, or onsets_s and offsets_s
(vak8-env)  ✘ pimienta@pop-os  ~/Documents/data/vocal/bug-report

It does look like this is in fact a bug I introduced when I added a new feature here:
8c6aee1
where for some reason that's not clear to me now I changed this function to return an empty string when there were no predicted segments after post-processing (e.g. with a min. segment dur):


whereas the same condition would have caused the previous version of this function would have returned all Nones:
return None, None, None

Not sure if changing it back will break any other changes I made--will find out when I test--but my plan is to change it back for now and release a bugfix version.

NickleDave added a commit that referenced this issue Mar 3, 2023
BUG: Have `to_segments` return all Nones for no segments, fix #634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants