Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformat, refactor save/load to include epsilon #4

Merged
merged 4 commits into from
Oct 22, 2020
Merged

Conversation

subramen
Copy link
Collaborator

No description provided.

Copy link
Owner

@yfeng997 yfeng997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor on path operation and removing unused libraries.

The load path auto discovery is very cool, but the convenience it adds does not justify the confusion it brings. Let's keep the load part explicit.

Comment on lines -37 to +32
self.net = self.net.float()
self.net = MarioNet(self.state_dim, self.action_dim).float()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I forget why we have the net.float() logic to begin with. Do you know if we need the casting here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was defaulting to double, but I can't remember exactly

Comment on lines 35 to 36
if load_existing:
self.load()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing.

For example, when you wanna check your model result, and replay the model, a new folder will be created to track your replay performance. Now if you load the model again to keep training, the replay folder will get picked up.

I think it'd be better to pass in an explicit load path here. It's much more clear, and users know exactly what model they are replaying/training. I'd change load_existing to load_dir=None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we call replay, should we be saving anything at all? Since in replay, we're only using the model for inference/serving.

We probably should take a look at replay.py to ensure it's running in eval mode

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should save the evaluation results. It's good practice for any evaluation.

Yep, I will double check it's in eval mode. I remember so.

version_2/agent.py Outdated Show resolved Hide resolved
save_path = os.path.join(self.save_dir, f"mario_net_{self.curr_step % self.save_total}.chkpt")
torch.save(self.net.state_dict(), save_path)
save_path = self.save_dir / f"mario_net_{self.curr_step % self.save_total}.chkpt"
ckp = {'state_dict': self.net.state_dict(), 'exploration_rate' : self.exploration_rate}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for saving exploration_rate as an additional parameter here! TIL. This is very helpful to resume training.

"""
if not os.path.exists(load_path):
def load(self, ckp_path=None):
load_path = ckp_path or sorted(list(self.save_dir.iterdir()))[-1] # Latest checkpoint
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for pushing back, but I doubt this is useful. When one is loading a trained model, it's critical to know exactly which model you passed in. Auto discovery here is a fancy feature, but it adds unnecessary confusion. load_path is better a required field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep pushing back, that's how we write better code!

My view is if you interrupt training, to resume you would almost always want to use the latest one. I don't see much utility in resuming training from 2000 steps when I have a model that has trained for 10000 steps. What do you say?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could be training from an earlier checkpoint because the latest training has something wrong, e.g. using wrong exploration rate, bugs in code. Also, we will be having evaluation log as well.

version_2/main.py Show resolved Hide resolved
version_2/main.py Show resolved Hide resolved
import matplotlib.pyplot as plt

class MetricLogger():
def __init__(self, save_dir):
self.save_log = os.path.join(save_dir, "log")
self.save_log = save_dir / "log"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fancy!

import numpy as np
import time
import datetime
import time, datetime
import matplotlib.pyplot as plt

class MetricLogger():
def __init__(self, save_dir):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add save_dir: Path here to make it explicit that save_dir is a Path object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Comment on lines -37 to -41
# possible loading path
# checkpoints/2020-10-13T00-53-30
# checkpoints/2020-10-15T00-12-19
# checkpoints/2020-10-17T01-44-25
# checkpoints/2020-10-19T16-32-36
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I mean. It's critical to pass in the exact model you are looking for, instead of relying on the auto-discovery.

Can we keep them here for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay I understand now, that maybe a previous folder may have trained for longer so we would like to resume from there. My assumption was that the more probable case is the latest folder/model would be the one we'd like to resume from - which is why I set the default to autodiscover. But i'm happy to be pushed back here :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, IMO there are cases where we wanna start from an earlier checkpoint. Thanks!

@subramen
Copy link
Collaborator Author

subramen commented Oct 21, 2020

For save_every=1000 this is what it looks like
image

Resuming looks like this
image

@yfeng997 yfeng997 closed this Oct 22, 2020
@yfeng997 yfeng997 reopened this Oct 22, 2020
Copy link
Owner

@yfeng997 yfeng997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits look great. Please address all the comments and the PR is good to go!

Comment on lines -37 to -41
# possible loading path
# checkpoints/2020-10-13T00-53-30
# checkpoints/2020-10-15T00-12-19
# checkpoints/2020-10-17T01-44-25
# checkpoints/2020-10-19T16-32-36
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, IMO there are cases where we wanna start from an earlier checkpoint. Thanks!

Don't suppress exception if wrong path is passed
@subramen subramen merged commit 7bb8f9d into master Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants