Reformat, refactor save/load to include epsilon #4

subramen · 2020-10-21T16:20:51Z

No description provided.

This reverts commit ddfa236.

yfeng997

Thanks for the refactor on path operation and removing unused libraries.

The load path auto discovery is very cool, but the convenience it adds does not justify the confusion it brings. Let's keep the load part explicit.

yfeng997 · 2020-10-21T18:17:35Z

version_2/agent.py

-        self.net = self.net.float()
+        self.net = MarioNet(self.state_dim, self.action_dim).float()


Hmm I forget why we have the net.float() logic to begin with. Do you know if we need the casting here?

I think it was defaulting to double, but I can't remember exactly

yfeng997 · 2020-10-21T18:30:42Z

version_2/agent.py

+        if load_existing:
+            self.load()


I think this is confusing.

For example, when you wanna check your model result, and replay the model, a new folder will be created to track your replay performance. Now if you load the model again to keep training, the replay folder will get picked up.

I think it'd be better to pass in an explicit load path here. It's much more clear, and users know exactly what model they are replaying/training. I'd change load_existing to load_dir=None.

When we call replay, should we be saving anything at all? Since in replay, we're only using the model for inference/serving.

We probably should take a look at replay.py to ensure it's running in eval mode

It should save the evaluation results. It's good practice for any evaluation.

Yep, I will double check it's in eval mode. I remember so.

version_2/agent.py

yfeng997 · 2020-10-21T18:35:38Z

version_2/agent.py

-        save_path = os.path.join(self.save_dir, f"mario_net_{self.curr_step % self.save_total}.chkpt")
-        torch.save(self.net.state_dict(), save_path)
+        save_path = self.save_dir / f"mario_net_{self.curr_step % self.save_total}.chkpt"
+        ckp = {'state_dict': self.net.state_dict(), 'exploration_rate' : self.exploration_rate}


Thanks for saving exploration_rate as an additional parameter here! TIL. This is very helpful to resume training.

yfeng997 · 2020-10-21T18:38:17Z

version_2/agent.py

-        """
-        if not os.path.exists(load_path):
+    def load(self, ckp_path=None):
+        load_path = ckp_path or sorted(list(self.save_dir.iterdir()))[-1] # Latest checkpoint


Sorry for pushing back, but I doubt this is useful. When one is loading a trained model, it's critical to know exactly which model you passed in. Auto discovery here is a fancy feature, but it adds unnecessary confusion. load_path is better a required field.

Keep pushing back, that's how we write better code!

My view is if you interrupt training, to resume you would almost always want to use the latest one. I don't see much utility in resuming training from 2000 steps when I have a model that has trained for 10000 steps. What do you say?

You could be training from an earlier checkpoint because the latest training has something wrong, e.g. using wrong exploration rate, bugs in code. Also, we will be having evaluation log as well.

version_2/main.py

yfeng997 · 2020-10-21T18:43:41Z

version_2/metrics.py

 import matplotlib.pyplot as plt

 class MetricLogger():
    def __init__(self, save_dir):
-        self.save_log = os.path.join(save_dir, "log")
+        self.save_log = save_dir / "log"


yfeng997 · 2020-10-21T18:44:39Z

version_2/metrics.py

 import numpy as np
-import time
-import datetime
+import time, datetime
 import matplotlib.pyplot as plt

 class MetricLogger():
    def __init__(self, save_dir):


I'd add save_dir: Path here to make it explicit that save_dir is a Path object.

yfeng997 · 2020-10-21T18:46:00Z

version_2/replay.py

-# possible loading path
-# checkpoints/2020-10-13T00-53-30
-# checkpoints/2020-10-15T00-12-19
-# checkpoints/2020-10-17T01-44-25
-# checkpoints/2020-10-19T16-32-36


This is what I mean. It's critical to pass in the exact model you are looking for, instead of relying on the auto-discovery.

Can we keep them here for now?

Oh okay I understand now, that maybe a previous folder may have trained for longer so we would like to resume from there. My assumption was that the more probable case is the latest folder/model would be the one we'd like to resume from - which is why I set the default to autodiscover. But i'm happy to be pushed back here :)

Yep, IMO there are cases where we wanna start from an earlier checkpoint. Thanks!

subramen · 2020-10-21T19:07:21Z

For save_every=1000 this is what it looks like

Resuming looks like this

yfeng997

Commits look great. Please address all the comments and the PR is good to go!

yfeng997 · 2020-10-22T00:45:37Z

version_2/replay.py

-# possible loading path
-# checkpoints/2020-10-13T00-53-30
-# checkpoints/2020-10-15T00-12-19
-# checkpoints/2020-10-17T01-44-25
-# checkpoints/2020-10-19T16-32-36


Yep, IMO there are cases where we wanna start from an earlier checkpoint. Thanks!

Don't suppress exception if wrong path is passed

Revert "Revert "Reformat, refactor save/load to include epsilon""

5d4e0c7

This reverts commit ddfa236.

subramen requested a review from yfeng997 October 21, 2020 16:22

fix model resume

924ad19

yfeng997 reviewed Oct 21, 2020

View reviewed changes

Save only one agent at save_every steps.

d83c0e2

subramen mentioned this pull request Oct 21, 2020

When we call replay, should we be saving anything at all? #5

Closed

yfeng997 closed this Oct 22, 2020

yfeng997 reopened this Oct 22, 2020

yfeng997 approved these changes Oct 22, 2020

View reviewed changes

Remove autodiscovery of checkpoint

5dbbce6

Don't suppress exception if wrong path is passed

yfeng997 approved these changes Oct 22, 2020

View reviewed changes

subramen merged commit 7bb8f9d into master Oct 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reformat, refactor save/load to include epsilon #4

Reformat, refactor save/load to include epsilon #4

subramen commented Oct 21, 2020

yfeng997 left a comment

yfeng997 Oct 21, 2020

subramen Oct 21, 2020

yfeng997 Oct 21, 2020

subramen Oct 21, 2020

yfeng997 Oct 21, 2020

yfeng997 Oct 21, 2020

yfeng997 Oct 21, 2020

subramen Oct 21, 2020

yfeng997 Oct 21, 2020

yfeng997 Oct 21, 2020

yfeng997 Oct 21, 2020

subramen Oct 21, 2020

yfeng997 Oct 21, 2020

subramen Oct 21, 2020

yfeng997 Oct 22, 2020

subramen commented Oct 21, 2020 •

edited

Loading

yfeng997 left a comment

yfeng997 Oct 22, 2020

		self.net = self.net.float()
		self.net = MarioNet(self.state_dim, self.action_dim).float()

Reformat, refactor save/load to include epsilon #4

Reformat, refactor save/load to include epsilon #4

Conversation

subramen commented Oct 21, 2020

yfeng997 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

subramen commented Oct 21, 2020 • edited Loading

yfeng997 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

subramen commented Oct 21, 2020 •

edited

Loading