-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reformat, refactor save/load to include epsilon #4
Conversation
This reverts commit ddfa236.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactor on path operation and removing unused libraries.
The load path auto discovery is very cool, but the convenience it adds does not justify the confusion it brings. Let's keep the load part explicit.
self.net = self.net.float() | ||
self.net = MarioNet(self.state_dim, self.action_dim).float() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I forget why we have the net.float()
logic to begin with. Do you know if we need the casting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was defaulting to double
, but I can't remember exactly
version_2/agent.py
Outdated
if load_existing: | ||
self.load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is confusing.
For example, when you wanna check your model result, and replay the model, a new folder will be created to track your replay performance. Now if you load the model again to keep training, the replay folder will get picked up.
I think it'd be better to pass in an explicit load path here. It's much more clear, and users know exactly what model they are replaying/training. I'd change load_existing
to load_dir=None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we call replay, should we be saving anything at all? Since in replay, we're only using the model for inference/serving.
We probably should take a look at replay.py to ensure it's running in eval mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should save the evaluation results. It's good practice for any evaluation.
Yep, I will double check it's in eval mode. I remember so.
version_2/agent.py
Outdated
save_path = os.path.join(self.save_dir, f"mario_net_{self.curr_step % self.save_total}.chkpt") | ||
torch.save(self.net.state_dict(), save_path) | ||
save_path = self.save_dir / f"mario_net_{self.curr_step % self.save_total}.chkpt" | ||
ckp = {'state_dict': self.net.state_dict(), 'exploration_rate' : self.exploration_rate} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for saving exploration_rate
as an additional parameter here! TIL. This is very helpful to resume training.
version_2/agent.py
Outdated
""" | ||
if not os.path.exists(load_path): | ||
def load(self, ckp_path=None): | ||
load_path = ckp_path or sorted(list(self.save_dir.iterdir()))[-1] # Latest checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for pushing back, but I doubt this is useful. When one is loading a trained model, it's critical to know exactly which model you passed in. Auto discovery here is a fancy feature, but it adds unnecessary confusion. load_path
is better a required field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep pushing back, that's how we write better code!
My view is if you interrupt training, to resume you would almost always want to use the latest one. I don't see much utility in resuming training from 2000 steps when I have a model that has trained for 10000 steps. What do you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could be training from an earlier checkpoint because the latest training has something wrong, e.g. using wrong exploration rate, bugs in code. Also, we will be having evaluation log as well.
import matplotlib.pyplot as plt | ||
|
||
class MetricLogger(): | ||
def __init__(self, save_dir): | ||
self.save_log = os.path.join(save_dir, "log") | ||
self.save_log = save_dir / "log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fancy!
import numpy as np | ||
import time | ||
import datetime | ||
import time, datetime | ||
import matplotlib.pyplot as plt | ||
|
||
class MetricLogger(): | ||
def __init__(self, save_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add save_dir: Path
here to make it explicit that save_dir is a Path object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
# possible loading path | ||
# checkpoints/2020-10-13T00-53-30 | ||
# checkpoints/2020-10-15T00-12-19 | ||
# checkpoints/2020-10-17T01-44-25 | ||
# checkpoints/2020-10-19T16-32-36 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I mean. It's critical to pass in the exact model you are looking for, instead of relying on the auto-discovery.
Can we keep them here for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay I understand now, that maybe a previous folder may have trained for longer so we would like to resume from there. My assumption was that the more probable case is the latest folder/model would be the one we'd like to resume from - which is why I set the default to autodiscover. But i'm happy to be pushed back here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, IMO there are cases where we wanna start from an earlier checkpoint. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commits look great. Please address all the comments and the PR is good to go!
# possible loading path | ||
# checkpoints/2020-10-13T00-53-30 | ||
# checkpoints/2020-10-15T00-12-19 | ||
# checkpoints/2020-10-17T01-44-25 | ||
# checkpoints/2020-10-19T16-32-36 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, IMO there are cases where we wanna start from an earlier checkpoint. Thanks!
Don't suppress exception if wrong path is passed
No description provided.