version 0.3 alpha #88

NickleDave · 2020-02-09T16:15:00Z

This pull request consists of a giant "feature branch: the rough draft for version 0.3.

Since @yardencsGitHub and I are the only ones really interacting with the code base right now, I am merging this 200-something commit "feature" even though that is not really best practice for OS development. I think the structure of the codebase is actually now clean enough that future feature branches can be more targeted.

This way we can get to an alpha version that we and others can actually use and test, and move forward with other things that depend on that.

Major changes:

switch to PyTorch
- because Tensorflow 1.0 is deprecated
- and because after experience with both frameworks, I think PyTorch is more suited to our needs
  - main plus is Dataset + Dataloader abstractions that seem like a better fit for
    "doing things with spectrograms / audio that don't fit neatly into what people do
    for computer vision tasks"
  - plus Python-first, 'mid-level' abstraction of framework lets code be readable but flexible
  - those are the pros, con is that not all the batteries are included, e.g. metrics computed by keras
remove Dataset class

choose to not maintain a giant class with many methods including those specialized
for saving and loading
better to have lightweight system that leverages other libraries wherever possible
i.e. we just create a simple .csv that represents the dataset, pointing to data files in the rows
and then use Pandas to work with this .csv "dataset abstraction"
no need for saving, loading, etc. -- giant .json file including arrays and things that take up
disk space with things that are already in other files
downside is metadata that user needs is not all encapsulated in one file, things like mapping
from integer outputs of class predictions from network to the labels that the user assigns those
classes in the annotation
- but it is "encapsulated" by being saved all together in the results directory
- could take the Audacity approach and append a "file extension" to this directory, .vak and
  treat it like a "project"

- add the following to REQUIRED: pandas, torch, torchvision, dask - bump version of crowsetta in REQUIRED to >=2.1.0

with keras.utils.Sequence class for Vak datasets

after changing vak.dataset.spect to vak.dataset.dataframe

so it adds a 'split' column to a DataFrame, instead of splitting a Dataset into subsets and then returning each subset as a separate Dataset.

- instead of having 'train_vds_path', 'val_vds_path', etc. - the 'split' column in the csv tells vak what splits there are and which samples belong to them

to emphasize this is a higher-level function that makes a dataset from **either** audio or spectrogram files

because they're not file input-output functions, so they don't belong in .io

make comma_separated_list converter just return the input value if it is already a list, or convert to list if it is a str, else raise a TypeError instead of crashing by trying to convert a list to a list

because no other functions actually use this attribute. User can just look at the path they specified as "root_results_dir" to figure out where the results are.

e.g. if there are no args passed to the loss function when instantiating it, just map the 'loss' key to an empty dictionary which will the get passed as the kwargs

from 'results' command, because 'config' command doesn't exist anymore

with iter_ helper function, that was in vak/models/util.py

version 0.3 alpha

NickleDave added 30 commits December 17, 2019 09:58

fix bug in annotation.py -- wrong variable name

dd8605d

update setup.py for this feature branch

57cfb47

- add the following to REQUIRED: pandas, torch, torchvision, dask - bump version of crowsetta in REQUIRED to >=2.1.0

add vak/dataset/classes/sequence.py

c92c667

with keras.utils.Sequence class for Vak datasets

rewrite dataset.spect.from_files as dataframe.from_files

c694abc

fix import statement in vak/dataset/__init__.py

0411e12

fix import statement in vak/dataset/prep.py

536041b

change 'spect' to 'dataframe' where required in tests

9bd63d1

after changing vak.dataset.spect to vak.dataset.dataframe

rewrite tests for from_files now that it returns dataframe

56d3935

fix import in tests/unit_tests/test_dataset/__init__.py

7d7abd1

rewrite vak.dataset.prep to use dataframe.from_files

d00f1d0

rewrite dataset.split.train_test_dur_split

af9b2be

so it adds a 'split' column to a DataFrame, instead of splitting a Dataset into subsets and then returning each subset as a separate Dataset.

rewrite vak/cli/prep.py to work with DataFrames

9e68eab

fix test_prep.py tests after rewriting dataset/prep.py

ae3a50d

remove tests/test_data/vds/

9d56822

fix setup_scripts/rerun_prep.py so it makes .csv files

df24ea8

fixup cli/prep.py so it adds .csv path to .ini file

aadf4dd

modify TrainConfig so csv_path is only path attribute

6334066

- instead of having 'train_vds_path', 'val_vds_path', etc. - the 'split' column in the csv tells vak what splits there are and which samples belong to them

modify LearncurveConfig so csv_path is only path attribute

b848063

modify PredictConfig so csv_path is only path attribute

236416e

rename vak.dataset sub-package to vak.io

1eed6e3

change vak.dataset to vak.io in tests/

f1cc299

change vak.dataset to vak.io in conda.recipe/meta.yaml

26fcf8e

remove unused vak/io/mat.py

4d04599

rename dataframe.from_files, now is spect.to_dataframe

d1f549f

change dataframe.from_files to spect.to_dataframe in tests

96470e6

rename io.prep.prep to io.dataset.from_files

54ded99

to emphasize this is a higher-level function that makes a dataset from **either** audio or spectrogram files

fix tests after renaming prep.prep to dataset.from_files

651e82f

move dataset splitting functions to utils sub-package

2ac0bdc

because they're not file input-output functions, so they don't belong in .io

fix tests after moving dataset split functions to utils

26876fa

rename test_dataset to test_io

c2c8e68

Embeddave and others added 22 commits February 9, 2020 10:45

rewrite cli/train.py without core

561fc77

rewrite cli/cli.py to use toml

d297c99

fix bug in converters.comma_separated_list

8287844

make comma_separated_list converter just return the input value if it is already a list, or convert to list if it is a str, else raise a TypeError instead of crashing by trying to convert a list to a list

remove core from cli/predict.py

bc91dc9

remove core from cli/learncurve

907c8aa

remove results_dirname attribute from TrainConfig

df914e1

because no other functions actually use this attribute. User can just look at the path they specified as "root_results_dir" to figure out where the results are.

remove batch_size parameter from Spectrogram dataset

464a08e

make model name validators test with 'Model' appended

78d3f03

make map_from_config deal with undeclared params

295277a

e.g. if there are no args passed to the loss function when instantiating it, just map the 'loss' key to an empty dictionary which will the get passed as the kwargs

fix how SpectrogramWindowDataset shape is determined

35acda0

remove 'config' dependency in Makefile

2225649

from 'results' command, because 'config' command doesn't exist anymore

modify environment.yml for general use

2bc71f7

add metrics sub-package, move metrics module into it

75a63dd

add metrics entry point in setup.py

76a4625

add entry_points module to util sub-package

9d18fd2

with iter_ helper function, that was in vak/models/util.py

have models/util use entry_points.iter_ from vak/util

9d5ddaa

add util module to metrics sub-package

db74a6a

add 'get_default_device' function to util.general

34b9c11

add 'num_workers' and 'device' attribs to TrainConfig

65d8700

use new TrainConfig attribs in cli/train

8938b28

add new train attribs to config/valid.toml

4fe9cde

fix bug in cli/train: missing mkdir

2fb6641

NickleDave mentioned this pull request Feb 9, 2020

Convert TweetyNet to Pytorch model yardencsGitHub/tweetynet#30

Merged

add tweetynet>=0.3.0 to REQUIRED in setup.py

6fe4285

NickleDave merged commit 18bea97 into master Feb 9, 2020

This was referenced Feb 17, 2020

add separate config section for learncurve #21

Closed

use crowsetta 2.0 #53

Closed

NickleDave mentioned this pull request Mar 7, 2020

switch Dataset to pandas Dataframe + torch Dataset #84

Closed

yardencsGitHub pushed a commit to yardencsGitHub/vak that referenced this pull request Aug 17, 2020

Merge pull request vocalpy#88 from NickleDave/0.3a-torch

5331050

version 0.3 alpha

NickleDave added a commit that referenced this pull request Sep 8, 2020

Merge pull request #88 from NickleDave/0.3a-torch

5f677c0

version 0.3 alpha

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

version 0.3 alpha #88

version 0.3 alpha #88

NickleDave commented Feb 9, 2020 •

edited

Loading

version 0.3 alpha #88

version 0.3 alpha #88

Conversation

NickleDave commented Feb 9, 2020 • edited Loading

NickleDave commented Feb 9, 2020 •

edited

Loading