Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of Buffer & Collector #105

Closed
Trinkle23897 opened this issue Jun 29, 2020 · 6 comments · Fixed by #125
Closed

Refactoring of Buffer & Collector #105

Trinkle23897 opened this issue Jun 29, 2020 · 6 comments · Fixed by #125
Labels
discussion Discussion of a typical issue

Comments

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jun 29, 2020

In the current implementation, in order to store data chronologically, I use some cache-buffer. However, it is hard to extend and maintain.

A better way is to use a 2d buffer. It still uses Batch to store data, but the first dimension is env_num. For example, if we collect data from 4 envs by vector-env, the replay buffer will store them as [env_num=4, timestep, ...].

This refactor would greatly simplify the code of collector, but should change some other parts of code (e.g., buffer.sample() and collector.sample()).

@Trinkle23897
Copy link
Collaborator Author

I think it is a prerequisite of #103

@Trinkle23897 Trinkle23897 added the enhancement Feature that is not a new algorithm or an algorithm enhancement label Jun 29, 2020
@Trinkle23897 Trinkle23897 added this to TODO in v1.0 roadmap Jul 6, 2020
@youkaichao
Copy link
Collaborator

I do not agree with the idea of 2d buffer. The formulation of reinforcement learning only involves a single environment. We use multiple environments for the pure purpose of accelerating simulation. So, conceptually, there is one buffer, one environment. I see no significance in distinguishing each environment.

I think we can maintain a main buffer, with multiple cache buffers, which may be enough for almost all use-cases.

@youkaichao
Copy link
Collaborator

youkaichao commented Jul 9, 2020

If one really wants to distinguish between multiple environments, then trajectories from each enviroment should be sampled independently, which can be achieved by instantiating multiple collector.

That said, it seems one buffer for collector is enough. No need to support 2d buffer.

image

@duburcqa
Copy link
Collaborator

duburcqa commented Jul 9, 2020

I get your point, and honestly I don't really care if it works at the end, but I disagree with your reasoning. There is not requirement for the implementation to match the theory of the current state of the art. The implementation must be designed to be easy to maintain and to understand, and computationally efficient. I don't see any reason why to stick that close to the theory: if it works, it works. To me these are extra constraints only for the sake of elegance, which is the curse of the researcher not the programmer. So maybe multiple-dimensional buffer is a bad idea, because perhaps it is neither "easy to maintain and to understand, and computationally efficient", I don't know, but if it is, to me it is a very bad idea not to take advantage of this.

@Trinkle23897 Trinkle23897 mentioned this issue Jul 9, 2020
8 tasks
@Trinkle23897 Trinkle23897 linked a pull request Jul 11, 2020 that will close this issue
@youkaichao
Copy link
Collaborator

I get your point, and honestly I don't really care if it works at the end, but I disagree with your reasoning. There is not requirement for the implementation to match the theory of the current state of the art. The implementation must be designed to be easy to maintain and to understand, and computationally efficient. I don't see any reason why to stick that close to the theory: if it works, it works. To me these are extra constraints only for the sake of elegance, which is the curse of the researcher not the programmer. So maybe multiple-dimensional buffer is a bad idea, because perhaps it is neither "easy to maintain and to understand, and computationally efficient", I don't know, but if it is, to me it is a very bad idea not to take advantage of this.

Yes, I think it is hard to maintain and conceptually hard to understand. So I vote for not supporting multiple-dimensional buffer.

@Trinkle23897
Copy link
Collaborator Author

Okay, I'll rewrite the collector of the current version instead of 2d-buffer. Thanks for the discussion!

v1.0 roadmap automation moved this from TODO to Done Jul 12, 2020
@Trinkle23897 Trinkle23897 added discussion Discussion of a typical issue and removed enhancement Feature that is not a new algorithm or an algorithm enhancement labels Jul 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion of a typical issue
Projects
No open projects
v1.0 roadmap
  
Done
Development

Successfully merging a pull request may close this issue.

3 participants