You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently it simply stores the first 100 values, optimizing for avoiding point lookups for very small maps. The proper way to do this would probably be to issue the state request itself from MultimapSideInput.get() in the bulk side input read block, iterate over the returned key-value-iterables, and add the keys and their corresponding (weighted) value iterables to the cache one at a time.
We would probably also want to store some state indicating whether the bulk-reading was already attempted, as well as (if the set of returned values was the entire map) the set of keys (or at least a bloom filter) such that we can return quickly with the empty iterable for those keys that we have discovered are not actually in the map (distinguishing from the case of a key having been evicted from the cache).
Alternatively, we could store the map (possibly of the first page alone) in cache as a single entry.
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam YAML
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Infrastructure
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner
The text was updated successfully, but these errors were encountered:
What would you like to happen?
Currently it simply stores the first 100 values, optimizing for avoiding point lookups for very small maps. The proper way to do this would probably be to issue the state request itself from
MultimapSideInput.get()
in the bulk side input read block, iterate over the returned key-value-iterables, and add the keys and their corresponding (weighted) value iterables to the cache one at a time.We would probably also want to store some state indicating whether the bulk-reading was already attempted, as well as (if the set of returned values was the entire map) the set of keys (or at least a bloom filter) such that we can return quickly with the empty iterable for those keys that we have discovered are not actually in the map (distinguishing from the case of a key having been evicted from the cache).
Alternatively, we could store the map (possibly of the first page alone) in cache as a single entry.
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
The text was updated successfully, but these errors were encountered: