CLN: Remove unused 'multiple model' functionality #538

NickleDave · 2022-07-08T03:11:40Z

In theory the config allows instantiating multiple models for training / prediction but this was never fully implemented and to make it really work is more trouble than it's worth I think

I thought I had raised an issue about this before but not finding it

This should be one of the things that's a clean-up step before adding new functionality

NickleDave · 2022-12-18T01:24:45Z

Doing this should fix #353

Read discussion on that issue to be sure

NickleDave · 2023-01-18T03:18:34Z

Should probably do this in 1.0 because changing option "models" -> "model" would be a breaking change.
We might as well just introduce the new config format at the same time.

NickleDave · 2023-02-12T03:02:13Z

Started to work on this while working on #624 to fix #601 but there's a couple different things tied up in it.

One is just removing the functionality.
The other is what to replace it with.

The latter is tied up with how we pass models from the cli functions to their corresponding core functions, which I'll describe now, since it shows how the replacement could get complicated down the road when we try to add more functionality.

How we do things now

Currently the way we do this is quite convoluted.

Currently we accept the option models in the train, eval, learncurve sections. We convert this option to a list (which always has just one element, the one model name)
- There's a converter, comma_separated_list, that takes the string we always pass in and makes it such a list, e.g. "tweetynet" -> ["tweetynet",] (even though the logic for iterating over more than one model never actually worked as stated above)
Then inside the cli functions that involve models (i.e., train, eval, learncurve, predict, but not prep) we pass this "list" of model names ( a single name) to config.models.map_from_path along with the toml path. This is because we need to find the corresponding config section since we don't leverage the TOML format very well.
We get back a dict mapping model names to a config in the form of a dict, with keys 'network', 'loss_func', etc. But again this dict only has one key, a single model name, that maps to a single config
We then pass this dict into a core function from its respective cli function
Then inside the core function we need to call another function to actually get the model instance: vak.models.models.from_model_config_map
This is because there is logic inside the core functions that create things we actually need to instantiate the model; in other words, there are required arguments for vak.models.models.from_model_config_map that we won't have until we've done some set up
- E.g., we need to know num_classes, and the way we do this is by creating a labelmap from the labelset of the config.
So we can't just call the function vak.models.models.from_model_config_map outside of the core function and then pass in an already-instantiated model

What to replace this with?

we could factor out all the set-up logic inside cli; basically inside of the cli it should be like a script written for the most common workflows. I thought I had written this somewhere but the closest quote I find is:

The key idea for any cli function should be "what is the most common / generic workflow for a user; capture that in code".
So if it doesn't look very much like something a user would want to write, re-factoring / abstraction needs to happen

from #359 (comment)

Two issues with just factoring out all the set-up logic though:

What happens when we have more than one workflow? We're going to want to abstract that out into other functions somehow, e.g. windowed_frame_classification_model_setup.
If we factor everything out of core that's set up, there actually won't be anything left in core! One would pass in model along with dataset and then call trainer(model, dataset). So basically instantiating a trainer would be in there? And maybe not even that should be in there, if we're going to have multiple workflows, each will need its own trainer set-up logic.

So, need to think about this more. Will probably raise a separate issue planning it out.

NickleDave · 2023-02-12T03:04:01Z

Okay so here's a way to get this done for now:

Rename option "models" in config sections to -> "model", validate. We can handle the new config format later
Rewrite vak.models.models.from_model_config_map as just vak.models.get(name, config). We can just keep the vak.config.models.from_toml_path logic for now that finds the corresponding model section in a config file.
Add parameters model_name and model_config to the core functions. Then we call vak.models.get with model_name and model_config.

Then we worry about all the refactoring stuff in the comment above, later.

> The config allows instantiating multiple models for training / prediction but this was never fully implemented Squash commits: - Make config just use "model" option, not "models": - Remove `comma_separated_list` from converters - Change option name 'models' -> 'model' in config/valid.toml - Rewrite is_valid_model_name to test a single string, not a list of strings - Change attribute `models` -> `model` in config/eval.py - Change attribute `models` -> `model` in config/learncurve.py - Change attribute `models` -> `model` in config/predict.py - Change attribute `models` -> `model` in config/train.py - Rewrite/rename config.models -> model.config_from_toml_path, model.config_from_toml_dict - Fix option 'models' -> model = 'str', in all .toml files in tests/data_for_tests/configs - Rewrite `models.from_model_config_map` as `models.get`: - Add src/vak/models/_api.py with BUILTIN_MODELS and MODEL_NAMES to use for validation in `models.get` - Rewrite models/models.py `from_model_config_map` as `models.get` - Import get and _api in vak/models/__init__.py - Rewrite core/train.py to take model_name and model_config then use models.get - Fix cli/train.py to pass model_name and model_config into core.train - Rewrite core/eval.py to take model_name and model_config then use models.get - Fix cli/eval.py to pass model_name and model_config into core.eval - Rewrite core/learncurve.py to take model_name and model_config - Fix cli/learncurve.py to pass model_name and model_config into core.learncurve - Rewrite core/predict.py to take model_name and model_config then use models.get - Fix cli/predict.py to pass model_name and model_config into core.predict - Make 'model' not 'models' required in src/vak/config/parse.py - Use models.MODEL_NAMES in src/vak/config/validators.py - Use models.MODEL_NAMES in config/model.py - Fix tests - Fix tests to use vak.config.model.config_from_toml_path: - tests/test_models/test_windowed_frame_classification_model.py - tests/test_core/test_train.py - tests/test_core/test_predict.py - tests/test_core/test_learncurve.py - tests/test_core/test_eval.py - Fix test to use 'model' option not 'models' in tests/test_config/test_parse.py - Fix assert helper function in tests/test_core - test_eval.py - test_learncurve.py - test_predict.py - test_prep.py - test_train.py - Rewrite fixture module with constants we can import in test modules to parametrize: tests/fixtures/config.py - Add tests/test_config/test_model.py

Remove multiple models logic, fix #538

> The config allows instantiating multiple models for training / prediction but this was never fully implemented Squash commits: - Make config just use "model" option, not "models": - Remove `comma_separated_list` from converters - Change option name 'models' -> 'model' in config/valid.toml - Rewrite is_valid_model_name to test a single string, not a list of strings - Change attribute `models` -> `model` in config/eval.py - Change attribute `models` -> `model` in config/learncurve.py - Change attribute `models` -> `model` in config/predict.py - Change attribute `models` -> `model` in config/train.py - Rewrite/rename config.models -> model.config_from_toml_path, model.config_from_toml_dict - Fix option 'models' -> model = 'str', in all .toml files in tests/data_for_tests/configs - Rewrite `models.from_model_config_map` as `models.get`: - Add src/vak/models/_api.py with BUILTIN_MODELS and MODEL_NAMES to use for validation in `models.get` - Rewrite models/models.py `from_model_config_map` as `models.get` - Import get and _api in vak/models/__init__.py - Rewrite core/train.py to take model_name and model_config then use models.get - Fix cli/train.py to pass model_name and model_config into core.train - Rewrite core/eval.py to take model_name and model_config then use models.get - Fix cli/eval.py to pass model_name and model_config into core.eval - Rewrite core/learncurve.py to take model_name and model_config - Fix cli/learncurve.py to pass model_name and model_config into core.learncurve - Rewrite core/predict.py to take model_name and model_config then use models.get - Fix cli/predict.py to pass model_name and model_config into core.predict - Make 'model' not 'models' required in src/vak/config/parse.py - Use models.MODEL_NAMES in src/vak/config/validators.py - Use models.MODEL_NAMES in config/model.py - Fix tests - Fix tests to use vak.config.model.config_from_toml_path: - tests/test_models/test_windowed_frame_classification_model.py - tests/test_core/test_train.py - tests/test_core/test_predict.py - tests/test_core/test_learncurve.py - tests/test_core/test_eval.py - Fix test to use 'model' option not 'models' in tests/test_config/test_parse.py - Fix assert helper function in tests/test_core - test_eval.py - test_learncurve.py - test_predict.py - test_prep.py - test_train.py - Rewrite fixture module with constants we can import in test modules to parametrize: tests/fixtures/config.py - Add tests/test_config/test_model.py

NickleDave added the CLN: clean / refactor code cleanup, e.g. refactoring, maintenance, typos label Jul 8, 2022

NickleDave mentioned this issue Jan 15, 2023

DEV: version 0.x issues that lay the foundation for version 1.0 #612

Open

2 tasks

This was referenced Jan 18, 2023

DEV: Version 1.0 to-do list #614

Open

ENH: Add from_config_toml_path classmethod to vak.models.base.Model #615

Open

NickleDave added a commit that referenced this issue Feb 13, 2023

MRG: #625 from vocalpy/remove-multiple-models-logic

256d509

Remove multiple models logic, fix #538

NickleDave closed this as completed in 1bf4b6a Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: Remove unused 'multiple model' functionality #538

CLN: Remove unused 'multiple model' functionality #538

NickleDave commented Jul 8, 2022

NickleDave commented Dec 18, 2022

NickleDave commented Jan 18, 2023

NickleDave commented Feb 12, 2023

NickleDave commented Feb 12, 2023 •

edited

Loading

CLN: Remove unused 'multiple model' functionality #538

CLN: Remove unused 'multiple model' functionality #538

Comments

NickleDave commented Jul 8, 2022

NickleDave commented Dec 18, 2022

NickleDave commented Jan 18, 2023

NickleDave commented Feb 12, 2023

How we do things now

What to replace this with?

NickleDave commented Feb 12, 2023 • edited Loading

NickleDave commented Feb 12, 2023 •

edited

Loading