Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine + core/cli re-design #359

Closed
NickleDave opened this issue May 15, 2021 · 6 comments
Closed

Engine + core/cli re-design #359

NickleDave opened this issue May 15, 2021 · 6 comments
Assignees
Labels
DEV: development development, not source code: e.g. change dependencies, bump version ENH: enhancement enhancement; new feature or request

Comments

@NickleDave
Copy link
Collaborator

NickleDave commented May 15, 2021

edit: this issue was originally about considering backends but I am hijacking it to collect my thoughts on how to (re)design everything


it would really be nice to not be in the business of maintaining a deep learning library, esp since whole teams of ppl are doing that already

would prefer scope for vak to be:

  • reference implementations of algos specific to vocal learning community
  • associated tools for benchmarking (e.g., WindowDataset)

question is what backends could be used that would handle training / eval / etc.
Starting this issue to keep track of thoughts

current list in my head:

  • https://github.com/explosion/thinc
    • pros:
      • framework agnostic, can use torch or TF <-- for me a big pro
      • already used for spacy
      • plays well with FastAPI, pydantic, and friends
      • TOML-like configuration file format (I think, need to check back)
    • (possible) cons:
      • don't have a feeling for how easily abstractions will work outside of NLP. Need to experiment with this in a branch
  • https://github.com/speechbrain/speechbrain
    • pros:
      • appear to be related abstractions we could use, e.g. loss functions
    • cons:
      • pytorch only
      • YAML config format, gross
@NickleDave NickleDave added the ENH: enhancement enhancement; new feature or request label May 15, 2021
@NickleDave
Copy link
Collaborator Author

NickleDave commented Dec 1, 2021

Still thinking about this.

rn feeling like the best compromise is still to write a very lightweight keras-like API for pytorch

mainly because we need the dataset tooling from torch, and we don't need all the massive lumbering technical debt of tf
https://www.youtube.com/watch?v=XHyASP49ses

the best way forward I think will be to write examples of what the interface should look like, then do the refactor around that. From Design Patterns (as quoted in Fluent Python):

“Program to an interface, not an implementation” and “Favor object composition over class inheritance.”

Basically the Engine class should be refactored to accept callbacks (as described in #405 ).
Logic for looping over multiple models, if kept, should be moved "up" from the core methods into the cli methods

Would be nice to provide a pytorch_lightning like Model class too as described in #406 .
I like the idea of declaring "this is a model" in code. I do not like the verbosity of pytorch_lightning

other refactoring notes:
The main thing I know right now is there are way too many conditionals within both core and cli methods.
A hand-wired conditional is just a hidden interface crying out: https://www.youtube.com/watch?v=OMPfEXIlTVE

The key idea for any cli function should be "what is the most common / generic workflow for a user; capture that in code".
So if it doesn't look very much like something a user would want to write, re-factoring / abstraction needs to happen

@NickleDave NickleDave changed the title consider options for a "backend" version 0.5 refactor Dec 1, 2021
@NickleDave NickleDave added the DEV: development development, not source code: e.g. change dependencies, bump version label Dec 1, 2021
@NickleDave NickleDave pinned this issue Dec 1, 2021
@NickleDave NickleDave changed the title version 0.5 refactor version 0.5 design Dec 1, 2021
@NickleDave NickleDave added this to To Do in DEV (roadmap) Dec 1, 2021
@NickleDave NickleDave changed the title version 0.5 design Engine + core/cli re-design Mar 20, 2022
@NickleDave NickleDave self-assigned this Mar 20, 2022
@NickleDave
Copy link
Collaborator Author

NickleDave commented Jul 1, 2022

So package structure will be something like this

vak.models  # <-- vak.models.sed.tweetynet, vak.models.gen.ava. or whatever schema makes sense
vak.datasets  # <-- bfrepo, etc.
vak.transforms  # clip, denoise, etc.,
vak.engine  # vak.engine.Engine.train, vak.engine.Engine.eval, etc.

as in #207 and #405

So the models module/sub-package will be equivalent to torchvision.models with models for task 1, task 2, etc., and then engine will live in its own separate module

@NickleDave
Copy link
Collaborator Author

NickleDave commented Jul 8, 2022

See also #536 -- Model should be an attrs class with: {network, loss, optimizer, metrics}, having defaults for each.
This can be used in each module in vak.models, i.e. there will be a TweetyNetModel in vak/models/sed/tweetynet.py

This is preferable to making a new Model class that users have to subclass as proposed in #406 -- the idea is actually similar (in that I called it an "interface")

@NickleDave
Copy link
Collaborator Author

Note that whatever we do should fix #362 -- this is the core problem to address here

By making it an attrs model with defaults we can easily instantiate the model by just saying ModelDataClass()

we can have a vak.models.register decorator similar to the one in crowsetta for formats

@NickleDave NickleDave moved this from To Do to In progress in DEV (roadmap) Aug 23, 2022
@NickleDave
Copy link
Collaborator Author

NickleDave commented Nov 23, 2022

Picking this up again.
The order of operations needs to be:

@NickleDave
Copy link
Collaborator Author

Closed by #605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEV: development development, not source code: e.g. change dependencies, bump version ENH: enhancement enhancement; new feature or request
Projects
DEV (roadmap)
  
Done
Development

No branches or pull requests

1 participant