Improve collector #125

youkaichao · 2020-07-10T14:20:59Z

This pull request keeps improving the collector.

make fileds with empty Batch rather than None after reset
add reward_metric argument to collector to deal with marl
rewrite collector code, with the internal data self.data be a Batch

codecov-commenter · 2020-07-11T03:13:06Z

Codecov Report

❗ No coverage uploaded for pull request base (dev@d1a2037). Click here to learn what that means.
The diff coverage is n/a.

@@          Coverage Diff           @@
##             dev     #125   +/-   ##
======================================
  Coverage       ?   88.63%           
======================================
  Files          ?       31           
  Lines          ?     1997           
  Branches       ?        0           
======================================
  Hits           ?     1770           
  Misses         ?      227           
  Partials       ?        0

Flag	Coverage Δ
#unittests	`88.63% <0.00%> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1a2037...c519af8. Read the comment docs.

Trinkle23897 · 2020-07-11T04:34:12Z

add get_final_reward_fn argument to collector to deal with marl

~~I don't agree with this point. I think the collector should return a np.ndarray of original reward data instead of processed data in this case.~~

youkaichao · 2020-07-11T05:29:02Z

add get_final_reward_fn argument to collector to deal with marl

I don't agree with this point. I think the collector should return a np.ndarray of original reward data instead of processed data in this case.

Disagree with your disagree. Even though the variable name is rew, it actually serves as a scalar metric to indicate the training process, just like accuray or loss. It is natural for people to provide a function to return a scalar metric here if rew is a vector.

duburcqa · 2020-07-11T13:57:44Z

It is natural for people to provide a function to return a scalar metric here if rew is a vector.

I agree that rew should always be a scalar, still I don't know if dealing with a vector must be done using an handle, for just doing something like simple averaging, and if the user wants to do something fancy, it is still possible for him to wrap the environment by himself to perform any other aggregation formula.

tianshou/data/collector.py

test/base/test_collector.py

tianshou/data/collector.py

Trinkle23897 · 2020-07-12T03:08:52Z

Should be okay now. Please have a review.

tianshou/data/collector.py

duburcqa · 2020-07-12T15:50:51Z

Good job! To me, Collector module is now easier to understand and maintain. I like the better integration with Batch, and the addition of a new unit test.

* remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com>

This reverts commit 26fb874.

* make fileds with empty Batch rather than None after reset * dummy code * remove dummy * add reward_length argument for collector * Improve Batch (#126) * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack * bugfix for reward_length * add get_final_reward_fn argument to collector to deal with marl * minor polish * remove multibuf * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * make fileds with empty Batch rather than None after reset * dummy code * remove dummy * add reward_length argument for collector * bugfix for reward_length * add get_final_reward_fn argument to collector to deal with marl * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack * dummy code * remove dummy * add multi-agent example: tic-tac-toe * move TicTacToeEnv to a separate file * remove dummy MANet * code refactor * move tic-tac-toe example to test * update doc with marl-example * fix docs * reduce the threshold * revert * update player id to start from 1 and change player to agent; keep coding * add reward_length argument for collector * Improve Batch (#128) * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * fix docs * fix docs * fix docs [ci skip] * fix docs [ci skip] Co-authored-by: Trinkle23897 <463003665@qq.com> * refact * re-implement Batch.stack and add testcases * add doc for Batch.stack * reward_metric * modify flag * minor fix * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix reward stat in collector * fix stat of collector, simplify test/base/env.py * fix docs * minor fix * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix * minor fix * marl-examples * add condense; bugfix for torch.Tensor; code refactor * marl example can run now * enable tic tac toe with larger board size and win-size * add test dependency * Fix padding of inconsistent keys with Batch.stack and Batch.cat (#130) * re-implement Batch.stack and add testcases * add doc for Batch.stack * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix docs * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix Co-authored-by: Trinkle23897 <463003665@qq.com> * stash * let agent learn to play as agent 2 which is harder * code refactor * Improve collector (#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> * marl for tic-tac-toe and general gomoku * update default gamma to 0.1 for tic tac toe to win earlier * fix name typo; change default game config; add rew_norm option * fix pep8 * test commit * mv test dir name * add rew flag * fix torch.optim import error and madqn rew_norm * remove useless kwargs * Vector env enable select worker (#132) * Enable selecting worker for vector env step method. * Update collector to match new vecenv selective worker behavior. * Bug fix. * Fix rebase Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> * show the last move of tictactoe by capital letters * add multi-agent tutorial * fix link * Standardized behavior of Batch.cat and misc code refactor (#137) * code refactor; remove unused kwargs; add reward_normalization for dqn * bugfix for __setitem__ with torch.Tensor; add Batch.condense * minor fix * support cat with empty Batch * remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases * support stack with empty Batch * remove condense * refactor code to reflect the shared / partial / reserved categories of keys * add is_empty(recursive=False) * doc fix * docfix and bugfix for _is_batch_set * add doc for key reservation * bugfix for algebra operators * fix cat with lens hint * code refactor * bugfix for storing None * use ValueError instead of exception * hide lens away from users * add comment for __cat * move the computation of the initial value of lens in cat_ itself. * change the place of doc string * doc fix for Batch doc string * change recursive to recurse * doc string fix * minor fix for batch doc * write tutorials to specify the standard of Batch (#142) * add doc for len exceptions * doc move; unify is_scalar_value function * remove some issubclass check * bugfix for shape of Batch(a=1) * keep moving doc * keep writing batch tutorial * draft version of Batch tutorial done * improving doc * keep improving doc * batch tutorial done * rename _is_number * rename _is_scalar * shape property do not raise exception * restore some doc string * grammarly [ci skip] * grammarly + fix warning of building docs * polish docs * trim and re-arrange batch tutorial * go straight to the point * minor fix for batch doc * add shape / len in basic usage * keep improving tutorial * unify _to_array_with_correct_type to remove duplicate code * delegate type convertion to Batch.__init__ * further delegate type convertion to Batch.__init__ * bugfix for setattr * add a _parse_value function * remove dummy function call * polish docs Co-authored-by: Trinkle23897 <463003665@qq.com> * bugfix for mapolicy * pretty code * remove debug code; remove condense * doc fix * check before get_agents in tutorials/tictactoe * tutorial * fix * minor fix for batch doc * minor polish * faster test_ttt * improve tic-tac-toe environment * change default epoch and step-per-epoch for tic-tac-toe * fix mapolicy * minor polish for mapolicy * 90% to 80% (need to change the tutorial) * win rate * show step number at board * simplify mapolicy * minor polish for mapolicy * remove MADQN * fix pep8 * change legal_actions to mask (need to update docs) * simplify maenv * fix typo * move basevecenv to single file * separate RandomAgent * update docs * grammarly * fix pep8 * win rate typo * format in cheatsheet * use bool mask directly * update doc for boolean mask Co-authored-by: Trinkle23897 <463003665@qq.com> Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com> Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>

* remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com>

* make fileds with empty Batch rather than None after reset * dummy code * remove dummy * add reward_length argument for collector * Improve Batch (thu-ml#126) * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack * bugfix for reward_length * add get_final_reward_fn argument to collector to deal with marl * minor polish * remove multibuf * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * make fileds with empty Batch rather than None after reset * dummy code * remove dummy * add reward_length argument for collector * bugfix for reward_length * add get_final_reward_fn argument to collector to deal with marl * make sure the key type of Batch is string, and add unit tests * add is_empty() function and unit tests * enable cat of mixing dict and Batch, just like stack * dummy code * remove dummy * add multi-agent example: tic-tac-toe * move TicTacToeEnv to a separate file * remove dummy MANet * code refactor * move tic-tac-toe example to test * update doc with marl-example * fix docs * reduce the threshold * revert * update player id to start from 1 and change player to agent; keep coding * add reward_length argument for collector * Improve Batch (thu-ml#128) * minor polish * improve and implement Batch.cat_ * bugfix for buffer.sample with field impt_weight * restore the usage of a.cat_(b) * fix 2 bugs in batch and add corresponding unittest * code fix for update * update is_empty to recognize empty over empty; bugfix for len * bugfix for update and add testcase * add testcase of update * fix docs * fix docs * fix docs [ci skip] * fix docs [ci skip] Co-authored-by: Trinkle23897 <463003665@qq.com> * refact * re-implement Batch.stack and add testcases * add doc for Batch.stack * reward_metric * modify flag * minor fix * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix reward stat in collector * fix stat of collector, simplify test/base/env.py * fix docs * minor fix * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix * minor fix * marl-examples * add condense; bugfix for torch.Tensor; code refactor * marl example can run now * enable tic tac toe with larger board size and win-size * add test dependency * Fix padding of inconsistent keys with Batch.stack and Batch.cat (thu-ml#130) * re-implement Batch.stack and add testcases * add doc for Batch.stack * reuse _create_values and refactor stack_ & cat_ * fix pep8 * fix docs * raise exception for stacking with partial keys and axis!=0 * minor fix * minor fix Co-authored-by: Trinkle23897 <463003665@qq.com> * stash * let agent learn to play as agent 2 which is harder * code refactor * Improve collector (thu-ml#125) * remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com> * marl for tic-tac-toe and general gomoku * update default gamma to 0.1 for tic tac toe to win earlier * fix name typo; change default game config; add rew_norm option * fix pep8 * test commit * mv test dir name * add rew flag * fix torch.optim import error and madqn rew_norm * remove useless kwargs * Vector env enable select worker (thu-ml#132) * Enable selecting worker for vector env step method. * Update collector to match new vecenv selective worker behavior. * Bug fix. * Fix rebase Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu> * show the last move of tictactoe by capital letters * add multi-agent tutorial * fix link * Standardized behavior of Batch.cat and misc code refactor (thu-ml#137) * code refactor; remove unused kwargs; add reward_normalization for dqn * bugfix for __setitem__ with torch.Tensor; add Batch.condense * minor fix * support cat with empty Batch * remove the dependency of is_empty on len; specify the semantic of empty Batch by test cases * support stack with empty Batch * remove condense * refactor code to reflect the shared / partial / reserved categories of keys * add is_empty(recursive=False) * doc fix * docfix and bugfix for _is_batch_set * add doc for key reservation * bugfix for algebra operators * fix cat with lens hint * code refactor * bugfix for storing None * use ValueError instead of exception * hide lens away from users * add comment for __cat * move the computation of the initial value of lens in cat_ itself. * change the place of doc string * doc fix for Batch doc string * change recursive to recurse * doc string fix * minor fix for batch doc * write tutorials to specify the standard of Batch (thu-ml#142) * add doc for len exceptions * doc move; unify is_scalar_value function * remove some issubclass check * bugfix for shape of Batch(a=1) * keep moving doc * keep writing batch tutorial * draft version of Batch tutorial done * improving doc * keep improving doc * batch tutorial done * rename _is_number * rename _is_scalar * shape property do not raise exception * restore some doc string * grammarly [ci skip] * grammarly + fix warning of building docs * polish docs * trim and re-arrange batch tutorial * go straight to the point * minor fix for batch doc * add shape / len in basic usage * keep improving tutorial * unify _to_array_with_correct_type to remove duplicate code * delegate type convertion to Batch.__init__ * further delegate type convertion to Batch.__init__ * bugfix for setattr * add a _parse_value function * remove dummy function call * polish docs Co-authored-by: Trinkle23897 <463003665@qq.com> * bugfix for mapolicy * pretty code * remove debug code; remove condense * doc fix * check before get_agents in tutorials/tictactoe * tutorial * fix * minor fix for batch doc * minor polish * faster test_ttt * improve tic-tac-toe environment * change default epoch and step-per-epoch for tic-tac-toe * fix mapolicy * minor polish for mapolicy * 90% to 80% (need to change the tutorial) * win rate * show step number at board * simplify mapolicy * minor polish for mapolicy * remove MADQN * fix pep8 * change legal_actions to mask (need to update docs) * simplify maenv * fix typo * move basevecenv to single file * separate RandomAgent * update docs * grammarly * fix pep8 * win rate typo * format in cheatsheet * use bool mask directly * update doc for boolean mask Co-authored-by: Trinkle23897 <463003665@qq.com> Co-authored-by: Alexis DUBURCQ <alexis.duburcq@gmail.com> Co-authored-by: Alexis Duburcq <alexis.duburcq@wandercraft.eu>

youkaichao requested a review from Trinkle23897 July 10, 2020 14:21

make fileds with empty Batch rather than None after reset

6e2a582

youkaichao force-pushed the collector branch from 6c83ff4 to 6e2a582 Compare July 10, 2020 14:25

youkaichao changed the base branch from master to dev July 10, 2020 15:13

youkaichao added 3 commits July 10, 2020 23:15

dummy code

2a2b887

remove dummy

cf32249

add reward_length argument for collector

62ac1d3

duburcqa previously approved these changes Jul 10, 2020

View reviewed changes

Trinkle23897 linked an issue Jul 11, 2020 that may be closed by this pull request

Refactoring of Buffer & Collector #105

Closed

Trinkle23897 changed the title ~~Improve collector~~ WIP: Improve collector Jul 11, 2020

Trinkle23897 self-assigned this Jul 11, 2020

youkaichao and others added 3 commits July 11, 2020 10:15

bugfix for reward_length

e1322c4

add get_final_reward_fn argument to collector to deal with marl

bc33b7c

minor polish

ddbaef4

Merge branch 'dev' into collector

1d5058b

remove multibuf

306dd68

Trinkle23897 dismissed duburcqa’s stale review via 306dd68 July 11, 2020 04:40

Trinkle23897 added 3 commits July 11, 2020 21:59

Merge branch 'dev' into collector

4ed64fd

refact

c519af8

reward_metric

9c4eb51

Trinkle23897 reviewed Jul 11, 2020

View reviewed changes

tianshou/data/collector.py Outdated Show resolved Hide resolved

modify flag

b6beb67

youkaichao commented Jul 11, 2020

View reviewed changes

test/base/test_collector.py Show resolved Hide resolved

youkaichao commented Jul 11, 2020

View reviewed changes

tianshou/data/collector.py Outdated Show resolved Hide resolved

youkaichao commented Jul 11, 2020

View reviewed changes

tianshou/data/collector.py Outdated Show resolved Hide resolved

duburcqa reviewed Jul 11, 2020

View reviewed changes

tianshou/data/collector.py Outdated Show resolved Hide resolved

minor fix

5ce2692

youkaichao force-pushed the collector branch from b14cd1b to 5ce2692 Compare July 11, 2020 16:35

youkaichao changed the title ~~WIP: Improve collector~~ Improve collector Jul 11, 2020

Trinkle23897 added 2 commits July 12, 2020 09:40

fix reward stat in collector

39c1b39

fix stat of collector, simplify test/base/env.py

96ee017

Trinkle23897 linked an issue Jul 12, 2020 that may be closed by this pull request

Inconsistent type annotation #127

Closed

8 tasks

minor fix

20ea6a1

minor fix

69181d8

youkaichao mentioned this pull request Jul 12, 2020

Add multi-agent example: tic-tac-toe #122

Merged

duburcqa reviewed Jul 12, 2020

View reviewed changes

tianshou/data/collector.py Outdated Show resolved Hide resolved

Merge branch 'dev' into collector

eb74a0a

duburcqa reviewed Jul 12, 2020

View reviewed changes

tianshou/data/collector.py Outdated Show resolved Hide resolved

modify _rew_metric

1a593b0

duburcqa approved these changes Jul 12, 2020

View reviewed changes

youkaichao merged commit 885fbc1 into thu-ml:dev Jul 12, 2020

Trinkle23897 pushed a commit that referenced this pull request Jul 13, 2020

Improve collector (#125)

26fb874

* remove multibuf * reward_metric * make fileds with empty Batch rather than None after reset * many fixes and refactor Co-authored-by: Trinkle23897 <463003665@qq.com>

youkaichao deleted the collector branch July 13, 2020 13:24

youkaichao mentioned this pull request Jul 14, 2020

Working with agent dimension in multi-agent workflows based on single policy (parameter sharing) #136

Closed

ChenDRAG added a commit to ChenDRAG/tianshou that referenced this pull request Jul 17, 2020

update with thu-ml#125

4a5669b

youkaichao added a commit that referenced this pull request Jul 20, 2020

Revert "Improve collector (#125)"

066cfc9

This reverts commit 26fb874.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve collector #125

Improve collector #125

youkaichao commented Jul 10, 2020 •

edited by Trinkle23897

codecov-commenter commented Jul 11, 2020 •

edited

Trinkle23897 commented Jul 11, 2020 •

edited

youkaichao commented Jul 11, 2020 •

edited

duburcqa commented Jul 11, 2020

Trinkle23897 commented Jul 12, 2020

duburcqa commented Jul 12, 2020 •

edited

Improve collector #125

Improve collector #125

Conversation

youkaichao commented Jul 10, 2020 • edited by Trinkle23897

codecov-commenter commented Jul 11, 2020 • edited

Codecov Report

Trinkle23897 commented Jul 11, 2020 • edited

youkaichao commented Jul 11, 2020 • edited

duburcqa commented Jul 11, 2020

Trinkle23897 commented Jul 12, 2020

duburcqa commented Jul 12, 2020 • edited

youkaichao commented Jul 10, 2020 •

edited by Trinkle23897

codecov-commenter commented Jul 11, 2020 •

edited

Trinkle23897 commented Jul 11, 2020 •

edited

youkaichao commented Jul 11, 2020 •

edited

duburcqa commented Jul 12, 2020 •

edited