vak 0.7.0 release notes

vak 0.7.0 is a maintenance release, but it does include some new features and bug fixes.
Highlights:

For annotation formats that have one annotation file per annotated file, vak can now recognize when
the annotation files are named by removing the annotated file extension (e.g., .wav or .npz)
and replacing it with the annotation format extension, e.g. .txt or .csv. (Other ways of relating annotations
and annotated files are still valid, e.g. by including the original source audio file in both filenames.)
The transform that normalizes spectrograms is now fit only to the training set; previously no split was specified and in some cases the entire dataset was used, which could potentially reduce the error on the test set because of dataset leakage (the model "knows" about the distribution of the test set because the parameters used to normalize the spectrograms take it into account). For training sets large enough to achieve good performance with current models, there is probably not a big enough difference between their distribution and that of the test set for this to seriously impact evaluation, but we have not tested this extensively.
Several other clean ups, additional unit tests, and minor bug fixes that should not have impacted performance but do make the library more efficient and robust.

Added

Add unit tests for csv.has_unlabled
#541.
Fixes #102.
Add unit tests for __main__
#542.
Fixes #337.
Add validation of labels argument to vak.split.algorithms.brute_force,
to prevent conditions where algorithm can fail to converge
because of bad input
#562.
Fixes #288.
Add a "Frequently Asked Questions" page to the documentation,
and a page to the "Reference" section on file naming conventions
#564.
Fixes #524
and #424.
Add a new way for vak to map annotation files to annotated files
when preparing datasets, e.g. for training models.
For annotation formats that have one annotation file per
annotated file, vak can now recognize when
the annotation files are named by removing the
annotated file extension (e.g., .wav or .npz)
and replacing it with the annotation format extension,
e.g. .txt or .csv. (Other ways of relating annotations
and annotated files are still valid, e.g. by including
the original source audio file in both filenames.)
#572.
Fixes #563.
Have runs from command-line interface log version to logfile
#587.
Fixes #216.

Changed

Rewrite unit tests in tests/test_cli/ to use mocks for vak.core functions
#544.
Fixes #543.
It is now possible to load configuration files
and work with them programmatically even if the paths
they point to do not exist.
The core functions handle validation instead.
E.g., the PrepConfig class does not check whether
output_dir exist is a directory, but vak.core.prep does.
#550.
Fixes #459.
Refactor and speed up logic for determining whether a
dataset with sequence annotations has unlabeled segments
that should be assigned a "background" label
#559.
Fixes #243.
- Adds a new sub-sub-package, datasets.seq
  with a validators module, which is where the
  re-written has_unlabeled function now lives.
  Replaces the vak.csv module which was not well named.
- Also adds a has_unlabeled function to vak.annotation
  that is used by vak.datasets.seq.validators.has_unlabeled;
  this function handles edge cases outlined in
  #243.
Rename and refactor functions in vak.annotation
that map annotations to the files that they annotate,
so that the purpose of the functions is clearer,
and add clearer error messages with links to documentation
about file naming conventions
#566.
Fixes #525.
Revise "autoannotate" tutorial to use .wav audio and .csv
annotation files from new release of Bengalese Finch Song
Repository, and to suggest that Windows users unpack
archives with tar, not other programs such as WinZip
#578.
Fixes #560
and #576.
Change vak.files.find_fname and vak.files.spect.find_audio_fname
so they work when spaces are in filename and/or path
#594.
Fixes #589.

Fixed

Fix how vak.core.prep handles labelset parameter.
Add pre-condition that raises a ValueError
when labelset is None but the .toml config is one of
{'train', 'learncurve', 'eval'}
#545.
Avoids running computationally expensive step of generating
and validating spectrograms before crashing when trying to
split the dataset using labelset. Also avoids silent
failures for datasets that do not require splitting,
e.g., an 'eval' set that could contain labels not in the
training set.
Fixes #468.
Fix how cli and core functions that have the csv_path parameter
handles it. The parameter points to a dataset .csv generated by vak prep
that other core/cli function use: train, learncurve, eval, predict.
They now validate that it exists, and if it doesn't, the cli functions
politely suggest running vak prep first; the core functions
raise a FileNotFoundError.
#546.
Fixes #469.
Fix bug where labelmap_path parameter was ignored by core.train.
Change function so that either labelmap_path or labelset must
be passed in, both passing in both will raise an error.
Also change cli.train to only pass in one of those and set the other
to None.
#552.
Fixes #547.
Fix vak.annotation.has_unlabeled to handle the edge case where an
annotation file has no annotated segments
#583.
Fixes #378.
Fix StandardizeSpect method fit_df so that it computes
parameters for standardization from a specific
split of the dataset--the training split, by default--instead
of using the entire dataset, which could technically give rise
to data leakage
#584.
Fixes #575.
Fix error message in vak.core.eval
#589.
Fixes #588.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.7.0

vak 0.7.0 release notes

Added

Changed

Fixed