Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training PREP error: AttributeError: 'Annotation' object has no attribute 'seq' #223

Closed
sfcooke96 opened this issue Jan 19, 2024 · 4 comments

Comments

@sfcooke96
Copy link

Hi there,

First time using tweetynet and vak. I'm attempting to use vak to train tweetynet on a small dataset of raven .txt files and .wav files and I can't seem to get past

vak prep gy6or6_train.toml

I keep getting the error:

Traceback (most recent call last):
File "/User/miniconda3/envs/vak-env/bin/vak", line 10, in
sys.exit(main())
File "/User/miniconda3/envs/vak-env/lib/python3.9/site-packages/vak/main.py", line 48, in main
cli.cli(command=args.command, config_file=args.configfile)
File "/User/miniconda3/envs/vak-env/lib/python3.9/site-packages/vak/cli/cli.py", line 54, in cli
COMMAND_FUNCTION_MAPcommand
File "/User/miniconda3/envs/vak-env/lib/python3.9/site-packages/vak/cli/cli.py", line 28, in prep
prep(toml_path=toml_path)
File "/User/miniconda3/envs/vak-env/lib/python3.9/site-packages/vak/cli/prep.py", line 122, in prep
dataset_df, dataset_path = prep_module.prep(
File "/User/miniconda3/envs/vak-env/lib/python3.9/site-packages/vak/prep/prep_.py", line 216, in prep
dataset_df, dataset_path = prep_parametric_umap_dataset(
File "/User/miniconda3/envs/vak-env/lib/python3.9/site-packages/vak/prep/parametric_umap/parametric_umap.py", line 214, in prep_parametric_umap_dataset
dataset_df, shape = prep_unit_dataset(
File "/User/miniconda3/envs/vak-env/lib/python3.9/site-packages/vak/prep/unit_dataset/unit_dataset.py", line 345, in prep_unit_dataset
annot_labelset = set(annot.seq.labels)
AttributeError: 'Annotation' object has no attribute 'seq'

I have attached an example of the training .txt file as well as a copy of the ....train.toml file for reference.

Any help is appreciated!

Stephen

copyof_train.toml.pdf

[example_data.pdf]
(https://github.com/yardencsGitHub/tweetynet/files/13982834/example_data.pdf)

@NickleDave
Copy link
Collaborator

NickleDave commented Jan 19, 2024

Hi @sfcooke96 glad to hear you are trying to TweetyNet and vak.
Sorry you're having this issue.

You've done everything the right way.
What's happening is this:

  • The library we use to load annotations, crowsetta, has two types of formats: "sequence" and "bbox".
    • (I think you might have figured this out already, I'm just over-explaining here in case some future person runs into the same problem and finds this issue.)
  • Currently in vak we always assume you are using a sequence type format; this is why you get the error about AttributeError: 'Annotation' object has no attribute 'seq' -- when vak loads the annotation with crowsetta, it will be in a bboxes attribute

The fix for now is to convert your annotations to a sequence.
A high-level description of how to do that is here: https://vak.readthedocs.io/en/latest/howto/howto_user_annot.html#howto-user-annot

Here's an example of how you would do this with a Raven selection table.

import crowsetta
import numpy as np

example = crowsetta.data.get('raven')
raven = crowsetta.formats.bbox.Raven.from_file(example.annot_path, annot_col='Species')
annot = raven.to_annot()
onsets_s = []
offsets_s = []
labels = []
for bbox in annot.bboxes:
    onsets_s.append(bbox.onset)
    offsets_s.append(bbox.offset)
    labels.append(bbox.label)
onsets_s = np.array(onsets_s)
offsets_s = np.array(offsets_s)
labels = np.array(labels)
simpleseq = crowsetta.formats.seq.SimpleSeq(
    onsets_s=onsets_s,
    offsets_s=offsets_s, 
    labels=labels,
    annot_path='/dummy/path'
)
simpleseq.to_file('example-data.csv')

Please test this out for me and let me know if it works.
(Using your data in place of the example data built in to crowsetta.)
I could also test it for you if you reply with your files attached in a different format, e.g. if you compress them both into a .zip or .tar.gz file. I couldn't easily get the annotations out of the PDF into the right format.

I'm afraid you might actually get an error if you do this with your data, because of the extra columns, such as the extracted features. If so, this would be a bug we need to fix in crowsetta; we want you to be able to work with any selection table that Raven saves as a txt file. If you do get such a bug, please report it at https://github.com/vocalpy/crowsetta/issues. I really appreciate it--we tried to make things easy for Raven users but I have to admit I haven't spent a lot of time with it yet.

There is a workaround, which would be to load the txt file directly with pandas and then convert it to a simple-seq annotation the same way I did above. I can reply with a snippet showing you how if we need to.

I am happy to help you get this figured out here on this issue, but just for future reference, this repo mainly exists for the paper, and TweetyNet is now built into vak. You can ask questions in the VocalPy forum, and you can report bugs / request features on the vak issue tracker.

@sfcooke96
Copy link
Author

sfcooke96 commented Jan 20, 2024

Hi @NickleDave , thanks for getting back! I will post any future questions in the repositories you mentioned.

After a few edits I have the following python script:

import crowsetta
import numpy as np


example = crowsetta.data.get('raven')
raven = crowsetta.formats.bbox.raven.Raven.from_file(example.annot_path, annot_col='Species')
annot = raven.to_annot()
onsets_s = []
offsets_s = []
labels = []
for bbox in annot.bboxes:
    onsets_s.append(bbox.onset)
    offsets_s.append(bbox.offset)
    labels.append(bbox.label)
onsets_s = np.array(onsets_s)
offsets_s = np.array(offsets_s)
labels = np.array(labels)
simpleseq = crowsetta.formats.seq.SimpleSeq(
    onsets_s=onsets_s,
    offsets_s=offsets_s, 
    labels=labels,
    annot_path='/User/training_data'
)
simpleseq.to_csv('annotations.csv')

What I'm running into is the following error:

AttributeError: 'SimpleSeq' object has no attribute 'to_csv'

Would the appropriate solution be to create a dataframe using pandas and write this to a .csv file following the instructions here: https://vak.readthedocs.io/en/latest/howto/howto_user_annot.html#howto-user-annot?

Another question while we're here: will training the model on simple-seq annotations restrict the predicted annotations to onset - offset borders without including high and low frequency bounds? I'm interested because I was hoping to estimate frequency ranges with the output data. Apologies if I'm misunderstanding how prediction output will be formatted.

Thanks again!

@NickleDave
Copy link
Collaborator

NickleDave commented Jan 22, 2024

What I'm running into is the following error:

AttributeError: 'SimpleSeq' object has no attribute 'to_csv'

Whoops, my fault, that should have been to_file not to_csv. Fixed above

will training the model on simple-seq annotations restrict the predicted annotations to onset - offset borders without including high and low frequency bounds? I'm interested because I was hoping to estimate frequency ranges with the output data

That was my next question for you.
TweetyNet only predicts start and stop times, not low and high frequencies.
It was originally developed for audio recordings with a single acoustically-isolated individual, e.g. a songbird in a soundbox, or a human speaking into a microphone. TweetyNet has been used on field recordings (which I'm guessing is what you might have) but unfortunately it won't give you frequency limits.

We have it on the to-do list for vak version 1.0 to add object detection models, which would give you frequency bounds. But those will only get added after some other development work in progress. AFAIK the main model people use when they want low/high freq bounds is Deepsqueak but it's only in Matlab. I haven't seen any Python implementations yet (but see my notes in the linked issue about OD models for some ideas).

There's several deep learning frameworks that are meant for more general bioacoustics but AFAIK they only output "detections" as defined here.
For example:

You might also look at Tessa's repo for other options: https://github.com/rhine3/bioacoustics-software

Sorry we can't help you more right now! 🙁
Please let me know if I've answered your questions

@NickleDave
Copy link
Collaborator

Closing this based on your reply here @sfcooke96
vocalpy/crowsetta#261 (comment)

Please don't hesitate to reach out on the forums if you have more questions, and of course if you have a bug / need a feature feel free to raise an issue on the appropriate repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants