Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't overwrite logs when resuming training #153

Open
peastman opened this issue Nov 18, 2022 · 3 comments
Open

Don't overwrite logs when resuming training #153

peastman opened this issue Nov 18, 2022 · 3 comments

Comments

@peastman
Copy link
Collaborator

I often want to resume training from a checkpoint with --load-model. When I do that, I don't want to lose all the information in the log and metrics.csv files. The obvious way to do that is to create a new log directory for the continuation and use --log-dir and --redirect to tell it to put all new files in the new directory. But it doesn't work. Instead it ignores those options and uses the same log directory as the original training run, deleting and overwriting the existing logs in the process. To prevent that, you first need to copy your existing log directory to a new location. I've several times lost work by forgetting to do that.

How about making it so that --load-model does not override --log-dir and --redirect? That's just telling it what model to load. It wouldn't prevent you from saving logs to a different directory.

@PhilippThoelke
Copy link
Collaborator

I'm not sure why overwriting log_dir doesn't work properly when load_model is set. The arguments are parsed in the order as they are defined in train.py. Since --load-model is the first argument there, the loaded model's hparams should always be overwritten by specifying further arguments via config file or CLI.

@peastman
Copy link
Collaborator Author

Is it because of this line in LoadFromCheckpoint?

namespace.__dict__.update(config)

namespace contains the options that were passed in. That line overwrites them with ones from the checkpoint, before they've yet had a chance to be processed.

@PhilippThoelke
Copy link
Collaborator

I somehow thought that arguments are parsed in the order in which they are defined in the code but a quick test showed that this is clearly not true. So yes, that line is the problem. We should probably only update the namespace with arguments that were not specified by the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants