Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement TD3+BC for offline RL #660

Merged
merged 5 commits into from
Jun 6, 2022
Merged

Conversation

nuance1979
Copy link
Collaborator

@nuance1979 nuance1979 commented Jun 4, 2022

  • I have marked all applicable categories:
    • exception-raising fix
    • algorithm implementation fix
    • documentation modification
    • new feature
  • I have reformatted the code using make format (required)
  • I have checked the code using make commit-checks (required)
  • If applicable, I have mentioned the relevant/related issue(s)
  • If applicable, I have listed every items in this Pull Request below
  • implement TD3+BC for offline RL;
  • fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;

@codecov-commenter
Copy link

codecov-commenter commented Jun 4, 2022

Codecov Report

Merging #660 (ec95af3) into master (9ce0a55) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #660      +/-   ##
==========================================
+ Coverage   93.63%   93.66%   +0.03%     
==========================================
  Files          71       72       +1     
  Lines        4757     4786      +29     
==========================================
+ Hits         4454     4483      +29     
  Misses        303      303              
Flag Coverage Δ
unittests 93.66% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/policy/__init__.py 100.00% <100.00%> (ø)
tianshou/policy/imitation/td3_bc.py 100.00% <100.00%> (ø)
tianshou/trainer/base.py 96.89% <100.00%> (+0.03%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@nuance1979 nuance1979 changed the title implement TD3-BC for offline RL implement TD3+BC for offline RL Jun 4, 2022
@nuance1979
Copy link
Collaborator Author

I tried to turn on/off with --norm-obs 1/0 and found the differences were small but the improvement was consistent:

Task w/ norm-obs w/o norm-obs
halfcheeta-medium-v2 5741.13 5724.41
halfcheeta-expert-v2 11788.25 11665.77
walker2d-medium-v2 4051.76 3985.59
walker2d-expert-v2 5068.15 5027.75

Also note that the policy converges quickly so the default epoch of 200 can be reduced to 100.

@Trinkle23897
Copy link
Collaborator

I tried to turn on/off with --norm-obs 1/0 and found the differences were small but the improvement was consistent:

Task w/ norm-obs w/o norm-obs
halfcheeta-medium-v2 5741.13 5724.41
halfcheeta-expert-v2 11788.25 11665.77
walker2d-medium-v2 4051.76 3985.59
walker2d-expert-v2 5068.15 5027.75
Also note that the policy converges quickly so the default epoch of 200 can be reduced to 100.

Why not add the result in examples/offline/README.md?

@Trinkle23897 Trinkle23897 merged commit df35718 into thu-ml:master Jun 6, 2022
@nuance1979 nuance1979 deleted the td3_bc branch June 7, 2022 18:39
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
- implement TD3+BC for offline RL;
- fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants