implement TD3+BC for offline RL #660

nuance1979 · 2022-06-04T02:32:25Z

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

implement TD3+BC for offline RL;
fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;

codecov-commenter · 2022-06-04T02:52:59Z

Codecov Report

Merging #660 (ec95af3) into master (9ce0a55) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #660      +/-   ##
==========================================
+ Coverage   93.63%   93.66%   +0.03%     
==========================================
  Files          71       72       +1     
  Lines        4757     4786      +29     
==========================================
+ Hits         4454     4483      +29     
  Misses        303      303

Flag	Coverage Δ
unittests	`93.66% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/policy/__init__.py	`100.00% <100.00%> (ø)`
tianshou/policy/imitation/td3_bc.py	`100.00% <100.00%> (ø)`
tianshou/trainer/base.py	`96.89% <100.00%> (+0.03%)`	⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

examples/offline/utils.py

nuance1979 · 2022-06-05T04:34:01Z

I tried to turn on/off with --norm-obs 1/0 and found the differences were small but the improvement was consistent:

Task	w/ norm-obs	w/o norm-obs
halfcheeta-medium-v2	5741.13	5724.41
halfcheeta-expert-v2	11788.25	11665.77
walker2d-medium-v2	4051.76	3985.59
walker2d-expert-v2	5068.15	5027.75

Also note that the policy converges quickly so the default epoch of 200 can be reduced to 100.

Trinkle23897 · 2022-06-05T16:41:38Z

I tried to turn on/off with --norm-obs 1/0 and found the differences were small but the improvement was consistent:

Task w/ norm-obs w/o norm-obs
halfcheeta-medium-v2 5741.13 5724.41
halfcheeta-expert-v2 11788.25 11665.77
walker2d-medium-v2 4051.76 3985.59
walker2d-expert-v2 5068.15 5027.75
Also note that the policy converges quickly so the default epoch of 200 can be reduced to 100.

Why not add the result in examples/offline/README.md?

- implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;

implement TD3-BC for offline RL

9d0027b

Merge branch 'master' into td3_bc

a42bee1

Trinkle23897 reviewed Jun 4, 2022

View reviewed changes

examples/offline/utils.py Show resolved Hide resolved

Yi Su added 2 commits June 4, 2022 14:52

address review comments

633a2ea

fix a typo

5aa480d

nuance1979 changed the title ~~implement TD3-BC for offline RL~~ implement TD3+BC for offline RL Jun 4, 2022

update README.md with results on norm-obs

ec95af3

Trinkle23897 approved these changes Jun 6, 2022

View reviewed changes

Trinkle23897 merged commit df35718 into thu-ml:master Jun 6, 2022

nuance1979 deleted the td3_bc branch June 7, 2022 18:39

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024

Implement TD3+BC for offline RL (thu-ml#660)

da4b598

- implement TD3+BC for offline RL; - fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement TD3+BC for offline RL #660

implement TD3+BC for offline RL #660

nuance1979 commented Jun 4, 2022 •

edited

Loading

codecov-commenter commented Jun 4, 2022 •

edited

Loading

nuance1979 commented Jun 5, 2022

Trinkle23897 commented Jun 5, 2022

implement TD3+BC for offline RL #660

implement TD3+BC for offline RL #660

Conversation

nuance1979 commented Jun 4, 2022 • edited Loading

codecov-commenter commented Jun 4, 2022 • edited Loading

Codecov Report

nuance1979 commented Jun 5, 2022

Trinkle23897 commented Jun 5, 2022

nuance1979 commented Jun 4, 2022 •

edited

Loading

codecov-commenter commented Jun 4, 2022 •

edited

Loading