Skip to content

0.7.0

Compare
Choose a tag to compare
@NickleDave NickleDave released this 23 Nov 17:57
· 206 commits to main since this release

vak 0.7.0 release notes

vak 0.7.0 is a maintenance release, but it does include some new features and bug fixes.
Highlights:

  • For annotation formats that have one annotation file per annotated file, vak can now recognize when
    the annotation files are named by removing the annotated file extension (e.g., .wav or .npz)
    and replacing it with the annotation format extension, e.g. .txt or .csv. (Other ways of relating annotations
    and annotated files are still valid, e.g. by including the original source audio file in both filenames.)
  • The transform that normalizes spectrograms is now fit only to the training set; previously no split was specified and in some cases the entire dataset was used, which could potentially reduce the error on the test set because of dataset leakage (the model "knows" about the distribution of the test set because the parameters used to normalize the spectrograms take it into account). For training sets large enough to achieve good performance with current models, there is probably not a big enough difference between their distribution and that of the test set for this to seriously impact evaluation, but we have not tested this extensively.
  • Several other clean ups, additional unit tests, and minor bug fixes that should not have impacted performance but do make the library more efficient and robust.

Added

  • Add unit tests for csv.has_unlabled
    #541.
    Fixes #102.
  • Add unit tests for __main__
    #542.
    Fixes #337.
  • Add validation of labels argument to vak.split.algorithms.brute_force,
    to prevent conditions where algorithm can fail to converge
    because of bad input
    #562.
    Fixes #288.
  • Add a "Frequently Asked Questions" page to the documentation,
    and a page to the "Reference" section on file naming conventions
    #564.
    Fixes #524
    and #424.
  • Add a new way for vak to map annotation files to annotated files
    when preparing datasets, e.g. for training models.
    For annotation formats that have one annotation file per
    annotated file, vak can now recognize when
    the annotation files are named by removing the
    annotated file extension (e.g., .wav or .npz)
    and replacing it with the annotation format extension,
    e.g. .txt or .csv. (Other ways of relating annotations
    and annotated files are still valid, e.g. by including
    the original source audio file in both filenames.)
    #572.
    Fixes #563.
  • Have runs from command-line interface log version to logfile
    #587.
    Fixes #216.

Changed

  • Rewrite unit tests in tests/test_cli/ to use mocks for vak.core functions
    #544.
    Fixes #543.
  • It is now possible to load configuration files
    and work with them programmatically even if the paths
    they point to do not exist.
    The core functions handle validation instead.
    E.g., the PrepConfig class does not check whether
    output_dir exist is a directory, but vak.core.prep does.
    #550.
    Fixes #459.
  • Refactor and speed up logic for determining whether a
    dataset with sequence annotations has unlabeled segments
    that should be assigned a "background" label
    #559.
    Fixes #243.
    • Adds a new sub-sub-package, datasets.seq
      with a validators module, which is where the
      re-written has_unlabeled function now lives.
      Replaces the vak.csv module which was not well named.
    • Also adds a has_unlabeled function to vak.annotation
      that is used by vak.datasets.seq.validators.has_unlabeled;
      this function handles edge cases outlined in
      #243.
  • Rename and refactor functions in vak.annotation
    that map annotations to the files that they annotate,
    so that the purpose of the functions is clearer,
    and add clearer error messages with links to documentation
    about file naming conventions
    #566.
    Fixes #525.
  • Revise "autoannotate" tutorial to use .wav audio and .csv
    annotation files from new release of Bengalese Finch Song
    Repository, and to suggest that Windows users unpack
    archives with tar, not other programs such as WinZip
    #578.
    Fixes #560
    and #576.
  • Change vak.files.find_fname and vak.files.spect.find_audio_fname
    so they work when spaces are in filename and/or path
    #594.
    Fixes #589.

Fixed

  • Fix how vak.core.prep handles labelset parameter.
    Add pre-condition that raises a ValueError
    when labelset is None but the .toml config is one of
    {'train', 'learncurve', 'eval'}
    #545.
    Avoids running computationally expensive step of generating
    and validating spectrograms before crashing when trying to
    split the dataset using labelset. Also avoids silent
    failures for datasets that do not require splitting,
    e.g., an 'eval' set that could contain labels not in the
    training set.
    Fixes #468.
  • Fix how cli and core functions that have the csv_path parameter
    handles it. The parameter points to a dataset .csv generated by vak prep
    that other core/cli function use: train, learncurve, eval, predict.
    They now validate that it exists, and if it doesn't, the cli functions
    politely suggest running vak prep first; the core functions
    raise a FileNotFoundError.
    #546.
    Fixes #469.
  • Fix bug where labelmap_path parameter was ignored by core.train.
    Change function so that either labelmap_path or labelset must
    be passed in, both passing in both will raise an error.
    Also change cli.train to only pass in one of those and set the other
    to None.
    #552.
    Fixes #547.
  • Fix vak.annotation.has_unlabeled to handle the edge case where an
    annotation file has no annotated segments
    #583.
    Fixes #378.
  • Fix StandardizeSpect method fit_df so that it computes
    parameters for standardization from a specific
    split of the dataset--the training split, by default--instead
    of using the entire dataset, which could technically give rise
    to data leakage
    #584.
    Fixes #575.
  • Fix error message in vak.core.eval
    #589.
    Fixes #588.