Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

version 0.3 alpha #88

Merged
merged 212 commits into from
Feb 9, 2020
Merged

version 0.3 alpha #88

merged 212 commits into from
Feb 9, 2020

Conversation

NickleDave
Copy link
Collaborator

@NickleDave NickleDave commented Feb 9, 2020

This pull request consists of a giant "feature branch: the rough draft for version 0.3.

Since @yardencsGitHub and I are the only ones really interacting with the code base right now, I am merging this 200-something commit "feature" even though that is not really best practice for OS development. I think the structure of the codebase is actually now clean enough that future feature branches can be more targeted.

This way we can get to an alpha version that we and others can actually use and test, and move forward with other things that depend on that.

Major changes:

  • switch to PyTorch
    • because Tensorflow 1.0 is deprecated
    • and because after experience with both frameworks, I think PyTorch is more suited to our needs
      • main plus is Dataset + Dataloader abstractions that seem like a better fit for
        "doing things with spectrograms / audio that don't fit neatly into what people do
        for computer vision tasks"
      • plus Python-first, 'mid-level' abstraction of framework lets code be readable but flexible
      • those are the pros, con is that not all the batteries are included, e.g. metrics computed by keras
  • remove Dataset class
  • choose to not maintain a giant class with many methods including those specialized
    for saving and loading
  • better to have lightweight system that leverages other libraries wherever possible
  • i.e. we just create a simple .csv that represents the dataset, pointing to data files in the rows
  • and then use Pandas to work with this .csv "dataset abstraction"
  • no need for saving, loading, etc. -- giant .json file including arrays and things that take up
    disk space with things that are already in other files
  • downside is metadata that user needs is not all encapsulated in one file, things like mapping
    from integer outputs of class predictions from network to the labels that the user assigns those
    classes in the annotation
    • but it is "encapsulated" by being saved all together in the results directory
    • could take the Audacity approach and append a "file extension" to this directory, .vak and
      treat it like a "project"

- add the following to REQUIRED: pandas, torch, torchvision, dask
- bump version of crowsetta in REQUIRED to >=2.1.0
with keras.utils.Sequence class for Vak datasets
after changing vak.dataset.spect to vak.dataset.dataframe
so it adds a 'split' column to a DataFrame, instead of splitting a
Dataset into subsets and then returning each subset as a separate
Dataset.
- instead of having 'train_vds_path', 'val_vds_path', etc.
- the 'split' column in the csv tells vak what splits there are
  and which samples belong to them
to emphasize this is a higher-level function that makes a dataset
from **either** audio or spectrogram files
because they're not file input-output functions,
so they don't belong in .io
Embeddave and others added 22 commits February 9, 2020 10:45
make comma_separated_list converter just return the input value
if it is already a list, or convert to list if it is a str,
else raise a TypeError

instead of crashing by trying to convert a list to a list
because no other functions actually use this attribute.
User can just look at the path they specified as "root_results_dir"
to figure out where the results are.
e.g. if there are no args passed to the loss function
when instantiating it, just map the 'loss' key to
an empty dictionary which will the get passed
as the kwargs
from 'results' command, because 'config' command
doesn't exist anymore
with iter_ helper function, that was in vak/models/util.py
@NickleDave NickleDave merged commit 18bea97 into master Feb 9, 2020
yardencsGitHub pushed a commit to yardencsGitHub/vak that referenced this pull request Aug 17, 2020
NickleDave added a commit that referenced this pull request Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants