Add Trainers as generators #559

jamartinh · 2022-03-05T20:06:20Z

The new proposed feature is to have trainers as generators.
The usage pattern is like:

trainer = onpolicy_trainer_generator(...)
for epoch, epoch_stat, info in trainer:
    print(f"Epoch: {epoch}")
    print(epoch_stat)
    print(info)
    do_something_with_policy()
    query_something_about_policy()
    make_a_plot_with(epoch_stat)
    display(ingo)

epoch int: the epoch number
epoch_stat dict: a large collection of metrics of current epoch, including stat
info dict: the usual dict out of the non generator version of trainer

You can even iterate on several different trainers at the same time:

trainer1 = onpolicy_trainer_generator(...)
trainer2 = onpolicy_trainer_generator(...)
for result1,result2 in zip(trainer1,trainer2):
   compare_results(result1,result2)

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

…o a generator that yields a 3-tuple (epoch, stats, info) of train results on every epoch.

Add tests for trainer generators.

fix offline.py

fix onpolicy.py

… windows make.

Trinkle23897 · 2022-03-05T22:02:55Z

Nice suggestion! But could you please remove the duplicated code in a single trainer file? I'm thinking about the following approach (or something similar):

class Trainer:
  def __init__(self, ...):
    ...
  def __iter__(self, ...):
    ...
  def run(self):
    ...

onpolicy_trainer = lambda *args, **kwargs: Trainer(*args, **kwargs).run()
onpolicy_trainer_gen = Trainer  # or another name

jamartinh · 2022-03-06T01:23:25Z

Hi @Trinkle23897 lets see how looks offline.py

Trinkle23897

LGTM, great work!

tianshou/trainer/__init__.py

Trinkle23897 · 2022-03-06T18:29:22Z

And is it possible to create BaseTrainer to further reduce code duplication?

…aner less code keeping the same functionality

…he sketch in offline.py

jamartinh · 2022-03-06T19:49:49Z

In the same line I just wanted to ask, what is specifically the difference between off policy and on policy? El dom., 6 mar. 2022 19:29, Jiayi Weng ***@***.***> escribió:

…

And is it possible to create BaseTrainer to further reduce code duplication? — Reply to this email directly, view it on GitHub <#559 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA3NNMZ6IDQIPNGEK75S3FTU6T2RDANCNFSM5QADHE7A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

jamartinh · 2022-03-06T19:57:03Z

Now pushed the OffPolicyTrainer

jamartinh · 2022-03-06T20:01:15Z

W.r.t common things of the three Trainers.

The init it is very the same except from few things
There are many common parameters, so that parameters can go to Base class
The run() functions will be exactly the same for all three Classes.

It is just the next that is different, however it has things in common.
I way of refactoring common things is to extract as common methods the repeated code parts in next

* All the procedures are so equal that separating them will make to much unnecessary duplicated complex code * Included tests in test_ppo.py, test_cql.py and test_rd3.py * It can be simplified even more, but would break backward Api compatibility

…ers_as_generators

jamartinh · 2022-03-08T19:25:53Z

Hi man @Trinkle23897 I think I have done what I can.
There is piece of code that any formatter can solve to pass lint:

https://github.com/jamartinh/tianshou/runs/5469774938?check_suite_focus=true#step:7:27

But I cannot go further, so help is needed if this refactoring is usefull.

Thanks,
JAMH

* Simplify return logic

codecov-commenter · 2022-03-12T20:12:07Z

Codecov Report

Merging #559 (a62cf84) into master (2336a7d) will decrease coverage by 0.11%.
The diff coverage is 98.32%.

@@            Coverage Diff             @@
##           master     #559      +/-   ##
==========================================
- Coverage   93.62%   93.50%   -0.12%     
==========================================
  Files          64       65       +1     
  Lines        4392     4419      +27     
==========================================
+ Hits         4112     4132      +20     
- Misses        280      287       +7

Flag	Coverage Δ
unittests	`93.50% <98.32%> (-0.12%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/trainer/base.py	`97.83% <97.83%> (ø)`
tianshou/trainer/__init__.py	`100.00% <100.00%> (ø)`
tianshou/trainer/offline.py	`100.00% <100.00%> (+3.77%)`	⬆️
tianshou/trainer/offpolicy.py	`100.00% <100.00%> (+2.50%)`	⬆️
tianshou/trainer/onpolicy.py	`100.00% <100.00%> (+6.17%)`	⬆️
tianshou/utils/logger/tensorboard.py	`73.80% <0.00%> (-21.43%)`	⬇️
tianshou/policy/modelfree/trpo.py	`88.52% <0.00%> (-4.92%)`	⬇️
tianshou/data/collector.py	`93.77% <0.00%> (-0.05%)`	⬇️
tianshou/env/worker/subproc.py	`94.03% <0.00%> (-0.04%)`	⬇️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

jamartinh · 2022-03-12T20:37:14Z

@Trinkle23897
I get success in my computer wit GPU:

RHEL 7

test_sac_with_il.py::test_sac_with_il 

============================== 1 passed in 35.95s ==============================

Process finished with exit code 0
PASSED                             [100%]Epoch #1: test_reward: -480.222050 ± 144.561932, best_reward: -480.222050 ± 144.561932 in #1
Epoch #2: test_reward: -223.095769 ± 115.835893, best_reward: -223.095769 ± 115.835893 in #2
Epoch #1:  75%|#######4  | 17991/24000 [00:24<00:08, 725.82it/s, alpha=0.634, env_step=17990, len=200, loss/actor=115.555, loss/alpha=-0.534, loss/critic1=14.512, loss/critic2=17.633, n/ep=0, n/st=10, rew=-511.42]
Epoch #1: 501it [00:00, 506.55it/s, env_step=500, len=0, loss=0.004, n/ep=0, n/st=10, rew=0.00]                         
Epoch #2: 501it [00:00, 576.68it/s, env_step=1000, len=0, loss=0.001, n/ep=0, n/st=10, rew=0.00]

CUDA Version: 11.4 
pytorch                   1.10.0          cuda112py39h3ad47f5_1    conda-forge
pytorch-gpu               1.10.0          cuda112py39h0bbbad9_1    conda-forge
torchaudio                0.10.0               py39_cu113    pytorch
torchmetrics              0.7.2              pyhd8ed1ab_0    conda-forge

Trinkle23897 · 2022-03-12T21:06:03Z

    def stop_fn(mean_rewards):
+       print("stop_fn", mean_rewards, args.reward_threshold)
        return mean_rewards >= args.reward_threshold

The upper half is the original version and the lower half is this version. It seems not exactly matched though...

* Simplify return logic

jamartinh · 2022-03-13T10:46:14Z

@Trinkle23897
Seems to be working for me, please report or help if you see any issue.
Thanks !

jamartinh · 2022-03-13T19:57:15Z

@Trinkle23897 Ok now what?

Trinkle23897 · 2022-03-13T19:58:41Z

@Trinkle23897 Ok now what?

Sorry about the delay because I have a deadline this evening and another deadline two days later. I'll have a look right after finishing those tasks.

tianshou/trainer/base.py

The new proposed feature is to have trainers as generators. The usage pattern is: ```python trainer = OnPolicyTrainer(...) for epoch, epoch_stat, info in trainer: print(f"Epoch: {epoch}") print(epoch_stat) print(info) do_something_with_policy() query_something_about_policy() make_a_plot_with(epoch_stat) display(info) ``` - epoch int: the epoch number - epoch_stat dict: a large collection of metrics of the current epoch, including stat - info dict: the usual dict out of the non-generator version of the trainer You can even iterate on several different trainers at the same time: ```python trainer1 = OnPolicyTrainer(...) trainer2 = OnPolicyTrainer(...) for result1, result2, ... in zip(trainer1, trainer2, ...): compare_results(result1, result2, ...) ``` Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>

jamartinh added 12 commits March 5, 2022 10:09

add docstring :param buffer to offline_trainer in offline.py

8bad065

Add param yield_epoch to trainers. if True, converts the function int…

ff9c0c9

…o a generator that yields a 3-tuple (epoch, stats, info) of train results on every epoch.

Add trainer geneators for offline.py, offpolicy.py and onpolicy.py .

2b72992

Add tests for trainer generators.

fix PEP8

9a6a72b

fix offline.py

fix PEP8

d05f0e0

fix onpolicy.py

fix PEP8

5566be0

fix yapf

185c006

removed comments in format section of Makefile. It produces errors on…

79f050a

… windows make.

fix isort

4cbc7c8

fix rare error with dict with mypy

ffbe30a

fix rare error with dict with mypy

23f00d2

fix docstrings

f64eb2d

refactored offline.py to one iterator class

b6b0ed7

Trinkle23897 reviewed Mar 6, 2022

View reviewed changes

tianshou/trainer/__init__.py Show resolved Hide resolved

jamartinh added 4 commits March 6, 2022 20:13

drop test_sac_with_il_trainer_generator.py

0f39eac

improve offline.py with best practices on exhausting iterator and cle…

21cdbe6

…aner less code keeping the same functionality

Create an Iterator class instead of a generator function, following t…

2483dea

…he sketch in offline.py

Expose new _iter versions and Iterator Classes

88cb63c

Add OffPolicyTrainer as Iterator adn add testing in test_td3.py

34feb5b

jamartinh and others added 4 commits March 6, 2022 21:11

fix doc format

1c7eaef

Merge branch 'master' into trainers_as_generators

4067428

Merge remote-tracking branch 'jamh/trainers_as_generators' into train…

d705744

…ers_as_generators

jamartinh added 3 commits March 12, 2022 20:26

* fix early stopping during train train_step

e6b00e2

* Simplify return logic

* fix early stopping during train train_step

4d76843

* Simplify return logic

* fix early stopping during train train_step

23ce483

* Simplify return logic

jamartinh added 2 commits March 13, 2022 08:51

* fix early stopping during train train_step

1d707f8

* Simplify return logic

* fix early stopping during train train_step

479b794

* Simplify return logic

Trinkle23897 added 4 commits March 16, 2022 09:39

Merge branch 'master' into trainers_as_generators

3adf0e1

fix a bug in BaseTrainer.run return value missing

08f65a6

change seed to pass ci

5ec4eb3

learning_type: str

89ce44f

Trinkle23897 reviewed Mar 16, 2022

View reviewed changes

tianshou/trainer/base.py Show resolved Hide resolved

fix ci

a320e68

Trinkle23897 reviewed Mar 16, 2022

View reviewed changes

tianshou/trainer/base.py Show resolved Hide resolved

Trinkle23897 reviewed Mar 16, 2022

View reviewed changes

tianshou/trainer/base.py Outdated Show resolved Hide resolved

Trinkle23897 added 2 commits March 17, 2022 08:07

reorg some code

6df9365

revert

7a00daf

Trinkle23897 force-pushed the trainers_as_generators branch from 2fd889f to 7a00daf Compare March 17, 2022 12:27

Trinkle23897 previously approved these changes Mar 17, 2022

View reviewed changes

missing docs for on-policy trainer

3ce4f6d

Trinkle23897 dismissed their stale review via 3ce4f6d March 17, 2022 12:35

missing docs

a62cf84

Trinkle23897 approved these changes Mar 17, 2022

View reviewed changes

Trinkle23897 merged commit 10d9190 into thu-ml:master Mar 17, 2022

jamartinh deleted the trainers_as_generators branch November 20, 2022 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Trainers as generators #559

Add Trainers as generators #559

jamartinh commented Mar 5, 2022

Trinkle23897 commented Mar 5, 2022

jamartinh commented Mar 6, 2022

Trinkle23897 left a comment

Trinkle23897 commented Mar 6, 2022

jamartinh commented Mar 6, 2022 via email

jamartinh commented Mar 6, 2022

jamartinh commented Mar 6, 2022

jamartinh commented Mar 8, 2022

codecov-commenter commented Mar 12, 2022 •

edited

Loading

jamartinh commented Mar 12, 2022

Trinkle23897 commented Mar 12, 2022

jamartinh commented Mar 13, 2022

jamartinh commented Mar 13, 2022

Trinkle23897 commented Mar 13, 2022

Add Trainers as generators #559

Add Trainers as generators #559

Conversation

jamartinh commented Mar 5, 2022

Trinkle23897 commented Mar 5, 2022

jamartinh commented Mar 6, 2022

Trinkle23897 left a comment

Choose a reason for hiding this comment

Trinkle23897 commented Mar 6, 2022

jamartinh commented Mar 6, 2022 via email

jamartinh commented Mar 6, 2022

jamartinh commented Mar 6, 2022

jamartinh commented Mar 8, 2022

codecov-commenter commented Mar 12, 2022 • edited Loading

Codecov Report

jamartinh commented Mar 12, 2022

Trinkle23897 commented Mar 12, 2022

jamartinh commented Mar 13, 2022

jamartinh commented Mar 13, 2022

Trinkle23897 commented Mar 13, 2022

codecov-commenter commented Mar 12, 2022 •

edited

Loading