-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of Buffer & Collector #105
Comments
I think it is a prerequisite of #103 |
I do not agree with the idea of 2d buffer. The formulation of reinforcement learning only involves a single environment. We use multiple environments for the pure purpose of accelerating simulation. So, conceptually, there is one buffer, one environment. I see no significance in distinguishing each environment. I think we can maintain a main buffer, with multiple cache buffers, which may be enough for almost all use-cases. |
I get your point, and honestly I don't really care if it works at the end, but I disagree with your reasoning. There is not requirement for the implementation to match the theory of the current state of the art. The implementation must be designed to be easy to maintain and to understand, and computationally efficient. I don't see any reason why to stick that close to the theory: if it works, it works. To me these are extra constraints only for the sake of elegance, which is the curse of the researcher not the programmer. So maybe multiple-dimensional buffer is a bad idea, because perhaps it is neither "easy to maintain and to understand, and computationally efficient", I don't know, but if it is, to me it is a very bad idea not to take advantage of this. |
Yes, I think it is hard to maintain and conceptually hard to understand. So I vote for not supporting multiple-dimensional buffer. |
Okay, I'll rewrite the collector of the current version instead of 2d-buffer. Thanks for the discussion! |
In the current implementation, in order to store data chronologically, I use some cache-buffer. However, it is hard to extend and maintain.
A better way is to use a 2d buffer. It still uses Batch to store data, but the first dimension is env_num. For example, if we collect data from 4 envs by vector-env, the replay buffer will store them as
[env_num=4, timestep, ...]
.This refactor would greatly simplify the code of collector, but should change some other parts of code (e.g.,
buffer.sample()
andcollector.sample()
).The text was updated successfully, but these errors were encountered: