Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/param reset #328

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

joshuaspear
Copy link
Contributor

Closes #326

@joshuaspear
Copy link
Contributor Author

@takuseno I am still working on the tests but please let me know if you think the implementation is a reasonable approach

def _get_layers(self, q_func:nn.ModuleList)->List[nn.Module]:
all_modules = {nm:module for (nm, module) in q_func.named_modules()}
q_func_layers = [
*all_modules["_encoder._layers"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takuseno assuming you're happy with the general approach of using the epoch_callback to inject the parameter reset functionality - I wondered if you could recommend a better approach for obtaining the encoder and fc layers which follows static typing?

@takuseno
Copy link
Owner

takuseno commented Sep 2, 2023

@joshuaspear Thanks for the proposal! For now, can we make this as an experimental feature? I imagine something like this:

file location

Let's make experimental directory:

d3rlpy/experimental/parameter_reset.py

usage

Just rough illustration:

import d3rlpy

# e.g. 50% reset, every 1000 gradient steps
parameter_reset = d3rlpy.experimental.ParameterReset(reset_ratio=0.5, reset_interval=1000)

cql = d3rlpy.algos.CQLConfig().create()

def callback(algo, epoch, total_step):
    parameter_reset(algo.q_function, total_step)
    
cql.fit(..., callback=callback)

In this way, we can use the existing callback to inject reset operation.

Reset is still under investigation in RL community. Once it gets more mature, we can lift this from experimental.

@joshuaspear
Copy link
Contributor Author

Makes sense - will have a go next week :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[REQUEST] Parameter resetting for improving replay ratio of off-policy Q functions
2 participants