Refactoring of Buffer & Collector #105

Trinkle23897 · 2020-06-29T06:17:50Z

In the current implementation, in order to store data chronologically, I use some cache-buffer. However, it is hard to extend and maintain.

A better way is to use a 2d buffer. It still uses Batch to store data, but the first dimension is env_num. For example, if we collect data from 4 envs by vector-env, the replay buffer will store them as [env_num=4, timestep, ...].

This refactor would greatly simplify the code of collector, but should change some other parts of code (e.g., buffer.sample() and collector.sample()).

The text was updated successfully, but these errors were encountered:

Trinkle23897 · 2020-06-29T06:19:04Z

I think it is a prerequisite of #103

youkaichao · 2020-07-09T02:42:43Z

I do not agree with the idea of 2d buffer. The formulation of reinforcement learning only involves a single environment. We use multiple environments for the pure purpose of accelerating simulation. So, conceptually, there is one buffer, one environment. I see no significance in distinguishing each environment.

I think we can maintain a main buffer, with multiple cache buffers, which may be enough for almost all use-cases.

youkaichao · 2020-07-09T02:49:11Z

If one really wants to distinguish between multiple environments, then trajectories from each enviroment should be sampled independently, which can be achieved by instantiating multiple collector.

That said, it seems one buffer for collector is enough. No need to support 2d buffer.

duburcqa · 2020-07-09T06:25:37Z

I get your point, and honestly I don't really care if it works at the end, but I disagree with your reasoning. There is not requirement for the implementation to match the theory of the current state of the art. The implementation must be designed to be easy to maintain and to understand, and computationally efficient. I don't see any reason why to stick that close to the theory: if it works, it works. To me these are extra constraints only for the sake of elegance, which is the curse of the researcher not the programmer. So maybe multiple-dimensional buffer is a bad idea, because perhaps it is neither "easy to maintain and to understand, and computationally efficient", I don't know, but if it is, to me it is a very bad idea not to take advantage of this.

youkaichao · 2020-07-11T01:45:17Z

I get your point, and honestly I don't really care if it works at the end, but I disagree with your reasoning. There is not requirement for the implementation to match the theory of the current state of the art. The implementation must be designed to be easy to maintain and to understand, and computationally efficient. I don't see any reason why to stick that close to the theory: if it works, it works. To me these are extra constraints only for the sake of elegance, which is the curse of the researcher not the programmer. So maybe multiple-dimensional buffer is a bad idea, because perhaps it is neither "easy to maintain and to understand, and computationally efficient", I don't know, but if it is, to me it is a very bad idea not to take advantage of this.

Yes, I think it is hard to maintain and conceptually hard to understand. So I vote for not supporting multiple-dimensional buffer.

Trinkle23897 · 2020-07-11T01:56:33Z

Okay, I'll rewrite the collector of the current version instead of 2d-buffer. Thanks for the discussion!

Trinkle23897 added the enhancement Feature that is not a new algorithm or an algorithm enhancement label Jun 29, 2020

Trinkle23897 mentioned this issue Jun 29, 2020

Advanced Batch slicing & minor fix of RNN support #106

Merged

Trinkle23897 added this to TODO in v1.0 roadmap Jul 6, 2020

Trinkle23897 mentioned this issue Jul 9, 2020

Async Sampling #103

Closed

8 tasks

Trinkle23897 linked a pull request Jul 11, 2020 that will close this issue

Improve collector #125

Merged

Trinkle23897 closed this as completed Jul 12, 2020

v1.0 roadmap automation moved this from TODO to Done Jul 12, 2020

Trinkle23897 added discussion Discussion of a typical issue and removed enhancement Feature that is not a new algorithm or an algorithm enhancement labels Jul 28, 2020

Trinkle23897 mentioned this issue Jul 28, 2020

Shared preprocess_net for actor and critic network #169

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring of Buffer & Collector #105

Refactoring of Buffer & Collector #105

Trinkle23897 commented Jun 29, 2020 •

edited

Trinkle23897 commented Jun 29, 2020

youkaichao commented Jul 9, 2020

youkaichao commented Jul 9, 2020 •

edited

duburcqa commented Jul 9, 2020 •

edited

youkaichao commented Jul 11, 2020

Trinkle23897 commented Jul 11, 2020

Refactoring of Buffer & Collector #105

Refactoring of Buffer & Collector #105

Comments

Trinkle23897 commented Jun 29, 2020 • edited

Trinkle23897 commented Jun 29, 2020

youkaichao commented Jul 9, 2020

youkaichao commented Jul 9, 2020 • edited

duburcqa commented Jul 9, 2020 • edited

youkaichao commented Jul 11, 2020

Trinkle23897 commented Jul 11, 2020

Trinkle23897 commented Jun 29, 2020 •

edited

youkaichao commented Jul 9, 2020 •

edited

duburcqa commented Jul 9, 2020 •

edited