Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Rewrite formats as sub-package of classes, rewrite API #161

Merged
merged 49 commits into from
May 6, 2022

Conversation

NickleDave
Copy link
Collaborator

Rewrites formats as classes, and closes many related issues.
Big step towards version 4.0 as described in #146

- add `interface` sub-pacakge with base.py
  + defines `BaseFormat`
- add seq sub-package in `interface` with base.py
  + defines `SeqLike` interface
- add bbox sub-package in `interface` with base.py
  + definfes `BBoxLike` interface
- import BaseFormat and sub-packages in interface/__init__.py
- import inside __init__ in Transcriber to avoid circular imports
- add formats/__init__.py
- with `as_list` and `by_name` functions
  that use module-level constant FORMAT
  constant built dynamically at imported-time
  from the `__all__`s of `crowsetta.formats.seq`
  and `crowsetta.formats.bbox`
because current function that converts everything to a list
of paths has unexpected side effects.
Just want a validator that tests whether the extension
is what we expect it to be,
that can be used by `Annotation.from_file` methods

- no longer accept a list
- instead accept "PathLike" as defined in `crowsetta.typing`
- no longer return a list, or anything; no side effects
- simply check that the paths `endswith` one of the
  string extensions passed in
- revise comment explaining why we use `endswith` not `suffix`
  for clarity
goal is parametrize `a_notmat` so that any test
using it runs multiple test cases with every .not.mat file.
https://hynek.me/til/parametrized-pytest-fixtures/

define paths as constants outside fixtures
so other modules can access them outside fixtures,
and then finally pass one of the constants
into a fixture `a_notmat_path` to parametrize.
Calling Sequence.__eq__
raises a FutureWarning due to comparing numpy arrays:
FutureWarning: elementwise comparison failed; returning scalar instead,
but in the future will perform elementwise comparison

The method as it was written was pretty verbose,
iterating over each array `_labels`, `_onsets_s`, ..., etc.,
and doing array comparisons on each of those.

It's much more concise to just do:
- if `other` is not a Sequence, return False
- if `other` Sequence doesn't have same number of segments,
  return False
- finally, zip the segments and compare each of them directly,
  and return boolean of all([self_seg == other_seg])

The Segment.__eq__ works well on its own,
so not only is this less wordy but it lets us rely on
a method that `attrs` writes for us
- Remove tests/data_for_tests/koumura/Bird0/Annotation.csv
- delete tests/data_for_tests/koumura/Bird0/ExpectedError
@NickleDave NickleDave force-pushed the rewrite-formats-as-classes-fix-#99 branch from d299628 to 695dc6a Compare May 6, 2022 02:49
- add .csv files generated by tests/scripts/remake_test_csv.py
  - rename / update with correct formatting:
    tests/data_for_tests/csv/gy6or6_032312.csv ->
    notmat_gy6or6_032312.csv
- update test .csv formatting so it raises correct error:
  tests/data_for_tests/csv/missing_fields_in_header.csv
- add test data .csv:
  tests/data_for_tests/csv/invalid_fields_in_header.csv
- remove test data .csv (wrong format, etc.):
  unrecognized_fields_in_header.csv
- rename + reformat:
  example_annotation_with_onset_inds_offset_inds_None.csv
  + remove invalid on/offset_ind columns full of 'None' str
  + rename: tests/data_for_tests/example_user_annotation.csv
- add onset_s_column_but_no_offset_s_column.csv
- add tests/data_for_tests/csv/no_onset_or_offset_column.csv
with fixtures that returns paths to .csv files
used for different tests
- Rename data_for_tests/audio_wav_annot_phn/ -> timit_kaggle/
- Rename data_for_tests/audio_wav_nist_annot_phn/ -> timit_nist/
- Reorganize data in tests/data_for_tests/timit_nist and timit_kaggle/
- rewrite fixtures in fixtures/timit.py

mainly so we can parametrize a fixture
with paths, i.e. the `a_phn_path` fixture
is parametrized with `ALL_PHN_PATHS`;
to have access to this list outside of the fixture,
to pass it into the parameter argument,
needed to declare constants outside the fixtures.
- to parametrize
- and add other fixtures, e.g. das_csv_paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment