-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize the behavior of Batch aggregation (stack/cat) when dealing with reserved keys #139
Comments
The reason for introducing The potential reason for explicitly managing all reserved keys is that maybe they are queried very often. But if we have |
I don't like the idea of |
Well, if we use So there should be some kind of How about use a = Batch() # no reserved keys
b = Batch(reserve_dict={'b':Batch._}) # reserve key 'b' Maybe we can keep the initialization as it is now, without explicitly using a = Batch() # no reserved keys
b = Batch(b:Batch._) # reserve key 'b'
c = Batch(c:[1, 2, Batch_]) # error |
Yes, but it could easily be hidden using a class
I don't know, i would rather use something like |
I don't get it. Why Batch.empty could be either None or Batch()? I think it is like a static singleton object. |
What's more, |
That's why I mentioned it should be |
So what's the fix now? Still use |
I would rather use a metaclass:
This way, the user can either manually specify |
The fix is what you think is better. But in my opinion, yes I would replace |
I think a better way is to add an extra argument in Batch.is_empty, namely |
Yes nice idea, but I would rather use the keyword |
So you mean we can still use |
This comment is obsole. I find another conceptual inconsistency: we want to disallow stack of So we are using It seems we still need to explicitly express key reservation, using a read-only static Or we can change the implementation. The main implementation lies in |
I would go for this solution. |
This comment is obsole. Then I think this issue is settled. The conclusions are:
This way, the issue can be solved by a minimal change of code. |
I would suggest to also add:
And do not close this issue, since I don't think it would be ideal. It is just a first step. We need to use to determine if it is efficient and convenient. If not, the alternative solution based on dedicated |
I think I made a mistake. stack of This gives rise to the concept of incomplete/finalized Batch objects: A Batch object is incomplete if it is
Then we need to define when Batch objects are compatible for aggregation (stack or concatenate).
In a word, incomplete Batch objects have a set of possible structures in the future, but finalized Batch object only have a finalized structure. Batch objects are compatible if and only if they share at least one commonly possible structure by extending keys. So, extending keys are always ok. We are disallowing cases like I think this unified view of stack / concatenate compatibility is intuitive and elegant, clearing much confusion. The implementation is also straightforward. |
It is more restrictive that the current implementation but I think it is indeed clearer and conceptually simple so it is better. The whole point of reserving keys was to do exactly what you are describing so it make sense. Still, you need to be careful, it must be possible to do this:
For the previous case to be supported, it seems not to be the case, but in practice being able to do this is necessary. |
Oh, I'll think about how to specify this case. But our implementation actually supports it. |
OK good. So the doc is wrong: "1. finalized Batch objects are compatible if and only if their structures are exactly the same.", or this is wrong "Batch objects are compatible if and only if they share at least one commonly possible structure." I would say: "1. finalized Batch objects are compatible if and only if their exists a way to extend reserved keys so that their structures are exactly the same. " And we are good. (make sure there is a unit test for this.) |
There is already a unit test for this: b1 = Batch(a=np.random.rand(3, 4), common=Batch(c=np.random.rand(3, 5)))
b2 = Batch(b=torch.rand(4, 3), common=Batch(c=np.random.rand(4, 5)))
test = Batch.cat([b1, b2]) |
Yes. finalized Batch objects are compatible if and only if their exists a way to extend keys so that their structures are exactly the same. Batch objects are compatible if and only if they share at least one commonly possible structure by extending keys. So, extending keys are always ok. We are disallowing this case |
Nice if it is (I have never implied suggesting it was not the case). So you just need to update the doc. |
Here we are ! |
I think we all share this common idea. It is just that we finally figured out how to describe it clearly. So here we are, setting up standards for the behavior of Batch. From now on, the implementation will need to follow the stand, but not the reserve. A tutorial on |
@youkaichao closing ? |
Currently, we use
Batch()
to indicate that a key reserves the place in Batch and it will have value later on. It works fine in most cases, but now we have noticed that it is problematic.The critic problem is: how to treat
Batch(c=Batch())
? There are two opinions:Batch(c=Batch())
means hierarchical key reservation, and should be treated as empty. But the problem is, we want to enable the concatenation of Batches with other empty Batches, which is seemingly natural. But consideringBatch.cat([Batch(a=np.random.rand(3, 4)), Batch(a=Batch(c=Batch()))])
, we would want to treatBatch(c=Batch())
as non-empty.Batch(c=Batch())
is not empty, and there is no need to support hierarchical key reservation. This makes the implementation straightforward, but does not support hierarchical key reservation. Unfortunately, there are some use cases of hierarchical key reservation in Tianshou.I think the critical problem is how to reserve keys and support hierarchical key reservation.
@Trinkle23897 @duburcqa
The text was updated successfully, but these errors were encountered: