Add SPS and q-values metrics for value-based methods #126

vwxyzjn · 2022-02-28T02:24:15Z

This PR adds SPS and q-values metrics for value methods.

This PR also use F.mse_loss() instead of loss_fn = nn.MSELoss()

@dosssman do you think we should also use using qf1() instead, which calls forward() by default?

gitpod-io · 2022-02-28T02:24:17Z

vercel · 2022-02-28T02:24:17Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/FqjMVLeRjw6sZJgzSowvU9U1RFMr
✅ Preview: https://cleanrl-git-add-sps-qvalues-vwxyzjn.vercel.app

dosssman · 2022-02-28T09:51:10Z

@vwxyzjn Regarding the qf1() instead of qf1.forward() I have a subjective preference because it just looks "cleaner" to me, and that is what I use in my code bases.

One potential downside that comes to mind however, is that a newcomer that does not know this is the default behavior of pytorch might get confused at first.
Using qf1.forward(), it could be easier for the reader to realize that this operation is used the execute the logic coded in the def forward(self, ...) of the Qnetwork class.

Although in SB3, they do seem to use the qf1() way I think:
https://github.com/DLR-RM/stable-baselines3/blob/cdaa9ab418aec18f41c7e8e12e0ad28f238553eb/stable_baselines3/common/torch_layers.py#L233

In any case, I floated this change in sac.py last time, but still a little bit on the fence on how appropriate it would be for cleanrl.

vwxyzjn · 2022-02-28T15:46:19Z

Fair enough, let's go ahead with this change qf1() instead of qf1.forward().

vwxyzjn · 2022-02-28T16:09:46Z

@dosssman it's ready for review.

dosssman

Sorry for the late answer. The review request escaped me somehow.

First, I would like to confirm the meaning of SPS.
(Sample || Step per seconds ?)

I have compiled a few minor tweaks that build on top of this PR in #130 .
Hopefully I did not misunderstand the intent of this PR.

Some additional comments are:

This might just be my OCD, but when printing the SPS for the terminal and tensorboard / wandb logging like here for example:

cleanrl/cleanrl/dqn_atari.py

Lines 213 to 214 in bb8dd13

    
           print("SPS:", int(global_step / (time.time() - start_time))) 
        
           writer.add_scalar("charts/SPS", int(global_step / (time.time() - start_time)), global_step)

how about doing something like this ?

SPS = int(global_step / (time.time() - start_time))
print("SPS:", int(global_step / (time.time() - start_time)))
writer.add_scalar("charts/SPS", int(global_step / (time.time() - start_time)), global_step)

Furthermore, we could also fuse the SPS and episode return logging here to only print once? :

cleanrl/cleanrl/dqn_atari.py

Lines 184 to 189 in bb8dd13

    
           for info in infos: 
        
               if "episode" in info.keys(): 
        
                   print(f"global_step={global_step}, episodic_return={info['episode']['r']}") 
        
                   writer.add_scalar("charts/episodic_return", info["episode"]["r"], global_step) 
        
                   writer.add_scalar("charts/epsilon", epsilon, global_step) 
        
                   break

The update to gym=0.23.0 breaks a few scripts, namely the rnd_ppo.py, apex_dqn_atari, and the 'offline/*scripts due to theMonitor` gym wrapper being removed form that version. This is the reason Tweaks for PR 'add-sps-qvalues' #130 tests break.

Regarding `cleanrl/offline`

SPS logging is missing
Monitor cannot be imported anymore due to gym=0.23.0 update
Tests for those two scripts are missing
Unlike dqn_atari, the wrappers are not imported from SB3
The offline-env-id that is required to load the dataset does not seem to work anymore. Is there any dependency missing, such as d4rl or d4rl_atari for example ?

Since the offline scripts are not really related to this PR, I did not go too much in detail.
We could merge this PR while opening an issue for the offline scripts, if not already done.

In any case, great job as always.

* Addtional tweaks regarding SPS and Q values, added tests for Atari related scripts while at it * test_atari.py: fixed the apex_dqn_atari.py test * quick fix * Fix pre-commit Co-authored-by: Costa Huang <costa.huang@outlook.com>

vwxyzjn · 2022-03-09T15:41:37Z

Merging now.

Add SPS and q-values metrics for value methods

c0bf292

Apply to ddpg, td3, and sac

cd3c939

vercel bot deployed to Preview February 28, 2022 02:27 View deployment

use F.mse_loss() instead of loss_fn = nn.MSELoss()

90e31de

vercel bot deployed to Preview February 28, 2022 02:37 View deployment

vwxyzjn requested a review from dosssman February 28, 2022 02:50

Refactor forward

2328f0a

vercel bot deployed to Preview February 28, 2022 15:46 View deployment

Quick fix

bb8dd13

vercel bot deployed to Preview February 28, 2022 16:09 View deployment

dosssman mentioned this pull request Mar 9, 2022

Tweaks for PR 'add-sps-qvalues' #130

Merged

dosssman approved these changes Mar 9, 2022

View reviewed changes

dosssman mentioned this pull request Mar 9, 2022

Improving offline RL scripts #131

Closed

Tweaks for PR 'add-sps-qvalues' (#130)

8b87a50

* Addtional tweaks regarding SPS and Q values, added tests for Atari related scripts while at it * test_atari.py: fixed the apex_dqn_atari.py test * quick fix * Fix pre-commit Co-authored-by: Costa Huang <costa.huang@outlook.com>

vercel bot deployed to Preview March 9, 2022 15:21 View deployment

Remove test cases

34f0324

vercel bot deployed to Preview March 9, 2022 15:25 View deployment

vwxyzjn merged commit 2828e83 into master Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SPS and q-values metrics for value-based methods #126

Add SPS and q-values metrics for value-based methods #126

vwxyzjn commented Feb 28, 2022 •

edited

Loading

gitpod-io bot commented Feb 28, 2022

vercel bot commented Feb 28, 2022 •

edited

Loading

dosssman commented Feb 28, 2022

vwxyzjn commented Feb 28, 2022

vwxyzjn commented Feb 28, 2022

dosssman left a comment •

edited

Loading

vwxyzjn commented Mar 9, 2022

	print("SPS:", int(global_step / (time.time() - start_time)))
	writer.add_scalar("charts/SPS", int(global_step / (time.time() - start_time)), global_step)

	for info in infos:
	if "episode" in info.keys():
	print(f"global_step={global_step}, episodic_return={info['episode']['r']}")
	writer.add_scalar("charts/episodic_return", info["episode"]["r"], global_step)
	writer.add_scalar("charts/epsilon", epsilon, global_step)
	break

Add SPS and q-values metrics for value-based methods #126

Add SPS and q-values metrics for value-based methods #126

Conversation

vwxyzjn commented Feb 28, 2022 • edited Loading

gitpod-io bot commented Feb 28, 2022

vercel bot commented Feb 28, 2022 • edited Loading

dosssman commented Feb 28, 2022

vwxyzjn commented Feb 28, 2022

vwxyzjn commented Feb 28, 2022

dosssman left a comment • edited Loading

Choose a reason for hiding this comment

Some additional comments are:

Regarding cleanrl/offline

vwxyzjn commented Mar 9, 2022

vwxyzjn commented Feb 28, 2022 •

edited

Loading

vercel bot commented Feb 28, 2022 •

edited

Loading

dosssman left a comment •

edited

Loading

Regarding `cleanrl/offline`