Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Remove unused 'multiple model' functionality #538

Closed
Tracked by #614
NickleDave opened this issue Jul 8, 2022 · 4 comments
Closed
Tracked by #614

CLN: Remove unused 'multiple model' functionality #538

NickleDave opened this issue Jul 8, 2022 · 4 comments
Labels
CLN: clean / refactor code cleanup, e.g. refactoring, maintenance, typos

Comments

@NickleDave
Copy link
Collaborator

In theory the config allows instantiating multiple models for training / prediction but this was never fully implemented and to make it really work is more trouble than it's worth I think

I thought I had raised an issue about this before but not finding it

This should be one of the things that's a clean-up step before adding new functionality

@NickleDave NickleDave added the CLN: clean / refactor code cleanup, e.g. refactoring, maintenance, typos label Jul 8, 2022
@NickleDave
Copy link
Collaborator Author

Doing this should fix #353

Read discussion on that issue to be sure

@NickleDave
Copy link
Collaborator Author

Should probably do this in 1.0 because changing option "models" -> "model" would be a breaking change.
We might as well just introduce the new config format at the same time.

@NickleDave
Copy link
Collaborator Author

Started to work on this while working on #624 to fix #601 but there's a couple different things tied up in it.

One is just removing the functionality.
The other is what to replace it with.

The latter is tied up with how we pass models from the cli functions to their corresponding core functions, which I'll describe now, since it shows how the replacement could get complicated down the road when we try to add more functionality.

How we do things now

Currently the way we do this is quite convoluted.

  • Currently we accept the option models in the train, eval, learncurve sections. We convert this option to a list (which always has just one element, the one model name)
    • There's a converter, comma_separated_list, that takes the string we always pass in and makes it such a list, e.g. "tweetynet" -> ["tweetynet",] (even though the logic for iterating over more than one model never actually worked as stated above)
  • Then inside the cli functions that involve models (i.e., train, eval, learncurve, predict, but not prep) we pass this "list" of model names ( a single name) to config.models.map_from_path along with the toml path. This is because we need to find the corresponding config section since we don't leverage the TOML format very well.
  • We get back a dict mapping model names to a config in the form of a dict, with keys 'network', 'loss_func', etc. But again this dict only has one key, a single model name, that maps to a single config
  • We then pass this dict into a core function from its respective cli function
  • Then inside the core function we need to call another function to actually get the model instance: vak.models.models.from_model_config_map
  • This is because there is logic inside the core functions that create things we actually need to instantiate the model; in other words, there are required arguments for vak.models.models.from_model_config_map that we won't have until we've done some set up
    • E.g., we need to know num_classes, and the way we do this is by creating a labelmap from the labelset of the config.
  • So we can't just call the function vak.models.models.from_model_config_map outside of the core function and then pass in an already-instantiated model

What to replace this with?

  • we could factor out all the set-up logic inside cli; basically inside of the cli it should be like a script written for the most common workflows. I thought I had written this somewhere but the closest quote I find is:

The key idea for any cli function should be "what is the most common / generic workflow for a user; capture that in code".
So if it doesn't look very much like something a user would want to write, re-factoring / abstraction needs to happen

from #359 (comment)

Two issues with just factoring out all the set-up logic though:

  • What happens when we have more than one workflow? We're going to want to abstract that out into other functions somehow, e.g. windowed_frame_classification_model_setup.
  • If we factor everything out of core that's set up, there actually won't be anything left in core! One would pass in model along with dataset and then call trainer(model, dataset). So basically instantiating a trainer would be in there? And maybe not even that should be in there, if we're going to have multiple workflows, each will need its own trainer set-up logic.

So, need to think about this more. Will probably raise a separate issue planning it out.

@NickleDave
Copy link
Collaborator Author

NickleDave commented Feb 12, 2023

Okay so here's a way to get this done for now:

  • Rename option "models" in config sections to -> "model", validate. We can handle the new config format later
  • Rewrite vak.models.models.from_model_config_map as just vak.models.get(name, config). We can just keep the vak.config.models.from_toml_path logic for now that finds the corresponding model section in a config file.
  • Add parameters model_name and model_config to the core functions. Then we call vak.models.get with model_name and model_config.

Then we worry about all the refactoring stuff in the comment above, later.

NickleDave added a commit that referenced this issue Feb 13, 2023
> The config allows instantiating multiple models
  for training / prediction but this was never fully implemented

Squash commits:
- Make config just use "model" option, not "models":
  - Remove `comma_separated_list` from converters
  - Change option name 'models' -> 'model' in config/valid.toml
  - Rewrite is_valid_model_name to test a single string,
    not a list of strings
  - Change attribute `models` -> `model` in config/eval.py
  - Change attribute `models` -> `model` in config/learncurve.py
  - Change attribute `models` -> `model` in config/predict.py
  - Change attribute `models` -> `model` in config/train.py
  - Rewrite/rename config.models -> model.config_from_toml_path,
    model.config_from_toml_dict
  - Fix option 'models' -> model = 'str',
    in all .toml files in tests/data_for_tests/configs
- Rewrite `models.from_model_config_map` as `models.get`:
  - Add src/vak/models/_api.py with BUILTIN_MODELS and
    MODEL_NAMES to use for validation in `models.get`
  - Rewrite models/models.py `from_model_config_map` as `models.get`
  - Import get and _api in vak/models/__init__.py
  - Rewrite core/train.py to take model_name and model_config
    then use models.get
  - Fix cli/train.py to pass model_name and model_config into core.train
  - Rewrite core/eval.py to take model_name and model_config
    then use models.get
  - Fix cli/eval.py to pass model_name and model_config into core.eval
  - Rewrite core/learncurve.py to take model_name and model_config
  - Fix cli/learncurve.py to pass model_name and model_config
    into core.learncurve
  - Rewrite core/predict.py to take model_name and model_config
    then use models.get
  - Fix cli/predict.py to pass model_name and model_config
    into core.predict
  - Make 'model' not 'models' required in src/vak/config/parse.py
  - Use models.MODEL_NAMES in src/vak/config/validators.py
  - Use models.MODEL_NAMES in config/model.py
- Fix tests
  - Fix tests to use vak.config.model.config_from_toml_path:
    - tests/test_models/test_windowed_frame_classification_model.py
    - tests/test_core/test_train.py
    - tests/test_core/test_predict.py
    - tests/test_core/test_learncurve.py
    - tests/test_core/test_eval.py
  - Fix test to use 'model' option not 'models' in
    tests/test_config/test_parse.py
  - Fix assert helper function in tests/test_core
    - test_eval.py
    - test_learncurve.py
    - test_predict.py
    - test_prep.py
    - test_train.py
  - Rewrite fixture module with constants we can import
    in test modules to parametrize: tests/fixtures/config.py
  - Add tests/test_config/test_model.py
NickleDave added a commit that referenced this issue Feb 13, 2023
Remove multiple models logic, fix #538
NickleDave added a commit that referenced this issue Feb 24, 2023
> The config allows instantiating multiple models
  for training / prediction but this was never fully implemented

Squash commits:
- Make config just use "model" option, not "models":
  - Remove `comma_separated_list` from converters
  - Change option name 'models' -> 'model' in config/valid.toml
  - Rewrite is_valid_model_name to test a single string,
    not a list of strings
  - Change attribute `models` -> `model` in config/eval.py
  - Change attribute `models` -> `model` in config/learncurve.py
  - Change attribute `models` -> `model` in config/predict.py
  - Change attribute `models` -> `model` in config/train.py
  - Rewrite/rename config.models -> model.config_from_toml_path,
    model.config_from_toml_dict
  - Fix option 'models' -> model = 'str',
    in all .toml files in tests/data_for_tests/configs
- Rewrite `models.from_model_config_map` as `models.get`:
  - Add src/vak/models/_api.py with BUILTIN_MODELS and
    MODEL_NAMES to use for validation in `models.get`
  - Rewrite models/models.py `from_model_config_map` as `models.get`
  - Import get and _api in vak/models/__init__.py
  - Rewrite core/train.py to take model_name and model_config
    then use models.get
  - Fix cli/train.py to pass model_name and model_config into core.train
  - Rewrite core/eval.py to take model_name and model_config
    then use models.get
  - Fix cli/eval.py to pass model_name and model_config into core.eval
  - Rewrite core/learncurve.py to take model_name and model_config
  - Fix cli/learncurve.py to pass model_name and model_config
    into core.learncurve
  - Rewrite core/predict.py to take model_name and model_config
    then use models.get
  - Fix cli/predict.py to pass model_name and model_config
    into core.predict
  - Make 'model' not 'models' required in src/vak/config/parse.py
  - Use models.MODEL_NAMES in src/vak/config/validators.py
  - Use models.MODEL_NAMES in config/model.py
- Fix tests
  - Fix tests to use vak.config.model.config_from_toml_path:
    - tests/test_models/test_windowed_frame_classification_model.py
    - tests/test_core/test_train.py
    - tests/test_core/test_predict.py
    - tests/test_core/test_learncurve.py
    - tests/test_core/test_eval.py
  - Fix test to use 'model' option not 'models' in
    tests/test_config/test_parse.py
  - Fix assert helper function in tests/test_core
    - test_eval.py
    - test_learncurve.py
    - test_predict.py
    - test_prep.py
    - test_train.py
  - Rewrite fixture module with constants we can import
    in test modules to parametrize: tests/fixtures/config.py
  - Add tests/test_config/test_model.py
NickleDave added a commit that referenced this issue Feb 24, 2023
> The config allows instantiating multiple models
  for training / prediction but this was never fully implemented

Squash commits:
- Make config just use "model" option, not "models":
  - Remove `comma_separated_list` from converters
  - Change option name 'models' -> 'model' in config/valid.toml
  - Rewrite is_valid_model_name to test a single string,
    not a list of strings
  - Change attribute `models` -> `model` in config/eval.py
  - Change attribute `models` -> `model` in config/learncurve.py
  - Change attribute `models` -> `model` in config/predict.py
  - Change attribute `models` -> `model` in config/train.py
  - Rewrite/rename config.models -> model.config_from_toml_path,
    model.config_from_toml_dict
  - Fix option 'models' -> model = 'str',
    in all .toml files in tests/data_for_tests/configs
- Rewrite `models.from_model_config_map` as `models.get`:
  - Add src/vak/models/_api.py with BUILTIN_MODELS and
    MODEL_NAMES to use for validation in `models.get`
  - Rewrite models/models.py `from_model_config_map` as `models.get`
  - Import get and _api in vak/models/__init__.py
  - Rewrite core/train.py to take model_name and model_config
    then use models.get
  - Fix cli/train.py to pass model_name and model_config into core.train
  - Rewrite core/eval.py to take model_name and model_config
    then use models.get
  - Fix cli/eval.py to pass model_name and model_config into core.eval
  - Rewrite core/learncurve.py to take model_name and model_config
  - Fix cli/learncurve.py to pass model_name and model_config
    into core.learncurve
  - Rewrite core/predict.py to take model_name and model_config
    then use models.get
  - Fix cli/predict.py to pass model_name and model_config
    into core.predict
  - Make 'model' not 'models' required in src/vak/config/parse.py
  - Use models.MODEL_NAMES in src/vak/config/validators.py
  - Use models.MODEL_NAMES in config/model.py
- Fix tests
  - Fix tests to use vak.config.model.config_from_toml_path:
    - tests/test_models/test_windowed_frame_classification_model.py
    - tests/test_core/test_train.py
    - tests/test_core/test_predict.py
    - tests/test_core/test_learncurve.py
    - tests/test_core/test_eval.py
  - Fix test to use 'model' option not 'models' in
    tests/test_config/test_parse.py
  - Fix assert helper function in tests/test_core
    - test_eval.py
    - test_learncurve.py
    - test_predict.py
    - test_prep.py
    - test_train.py
  - Rewrite fixture module with constants we can import
    in test modules to parametrize: tests/fixtures/config.py
  - Add tests/test_config/test_model.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLN: clean / refactor code cleanup, e.g. refactoring, maintenance, typos
Projects
None yet
Development

No branches or pull requests

1 participant