ENH: Deprecate WindowDataset #555

NickleDave · 2022-07-29T13:47:42Z

Currently training is tightly coupled to the WindowDataset abstraction.
A clear drawback of this is that it prevents us from training in any other way, e.g. feeding in an entire vocalization (e.g. a birdsong bout) as one sample in a batch.
It's also the case that the logic wrapped up in this function is super hard to read, there's a lot of array-oriented programming that can lead to subtle errors, see for example #169 #213 #217 #219 #220

So this issue formalizes the idea of deprecating WindowDataset.
Instead each sample in a dataset will now be one vocalization, however it is defined for that particular dataset. Typically this will map to one audio file or one spectrogram, e.g. for birdsong one bout of song. In other words, one row from a dataframe representing a dataset, as produced by vocles,

The text was updated successfully, but these errors were encountered:

NickleDave · 2022-07-29T14:37:18Z

For more detail on what we get from the WindowDataset abstraction, and how we will achieve the same thing while deprecating it, see #556 and #557.

Key things are:

creating a dataset of a fixed size (discuss: how to fix size of dataset without WindowDataset abstraction? #556)
data augmentation by seeing many different windows (ENH: Add RandomCrop transform #557)

NickleDave · 2023-03-18T16:44:42Z

Not going to do this for now, as discussed in #651 and issues linked there. Closing

NickleDave · 2023-03-18T16:47:11Z

A clear drawback of this is that it prevents us from training in any other way, e.g. feeding in an entire vocalization (e.g. a birdsong bout) as one sample in a batch.
It's also the case that the logic wrapped up in this function is super hard to read, there's a lot of array-oriented programming that can lead to subtle errors, see for example #169 #213 #217 #219 #220

So this issue formalizes the idea of deprecating WindowDataset.
Instead each sample in a dataset will now be one vocalization, however it is defined for that particular dataset. Typically this will map to one audio file or one spectrogram, e.g. for birdsong one bout of song. In other words, one row from a dataframe representing a dataset, as produced by vocles,

We will still add some sort of abstraction like FileDataset where file can be an audio file or spectrogram file

This was referenced Jul 29, 2022

start of training slow for very large datasets - parallelize sthing in from_csv method? #169

Closed

discuss: how to fix size of dataset without WindowDataset abstraction? #556

Closed

NickleDave changed the title ~~ENH: Replace WindowDataset with RandomCrop transform~~ ENH: Deprecate WindowDataset; add RandomCrop transform Jul 29, 2022

NickleDave mentioned this issue Jul 29, 2022

ENH: Add RandomCrop transform #557

Open

3 tasks

NickleDave changed the title ~~ENH: Deprecate WindowDataset; add RandomCrop transform~~ ENH: Deprecate WindowDataset Jul 29, 2022

NickleDave mentioned this issue Jan 15, 2023

DEV: version 0.x issues that lay the foundation for version 1.0 #612

Open

2 tasks

NickleDave mentioned this issue Mar 18, 2023

ENH: Have prep generate learncurve splits ahead of time #651

Closed

3 tasks

NickleDave closed this as completed Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Deprecate WindowDataset #555

ENH: Deprecate WindowDataset #555

NickleDave commented Jul 29, 2022 •

edited

Loading

NickleDave commented Jul 29, 2022

NickleDave commented Mar 18, 2023

NickleDave commented Mar 18, 2023

ENH: Deprecate WindowDataset #555

ENH: Deprecate WindowDataset #555

Comments

NickleDave commented Jul 29, 2022 • edited Loading

NickleDave commented Jul 29, 2022

NickleDave commented Mar 18, 2023

NickleDave commented Mar 18, 2023

NickleDave commented Jul 29, 2022 •

edited

Loading