You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Such mistake occurs when there are more than 32 (aka spark.sql.sources.parallelPartitionDiscovery.threshold) dynamic tables to process simultaneously.
First of all, the mistake is very confusing; if any kind of a safe limit prevents a query from executing, it should be clear which limit is exceeded and (ideally) which parameter is responsible for this limit.
Second, it is not clear why dynamic tables are any different from static tables here.
Third, it is unclear how is this issue related to the (opt-in) feature of using partition_tables call for slicing both static and dynamic tables. Since this option erases any distinction between static and dynamic tables, it looks like enabling it either fixes both static and dynamic cases, or breaks them simultaneously.
The text was updated successfully, but these errors were encountered:
This commit adds ability to list (determine chunks/pivot keys) dyntables on executors. It happens when partitioning table count is more than 32 (by default).
Listing tables on executor requires serialization of paths. At first raw paths are sent to executors, than after processing they return back to driver. Result must contain information about reading range, but API is poor here and it expects to communicate by Path objects (approximately equivalent to simple string).
In case of static tables we need only 2 integers, begin row and row count, it turns into string simply. But dyntables required more complex serialization, because keys have flexible representation.
In this commit range information is packed in string with base64-encoded ysons and can be used in driver/executor side.
https://gist.github.com/zlobober/c5f782152cc19f10232730401b4b436f
Such mistake occurs when there are more than 32 (aka
spark.sql.sources.parallelPartitionDiscovery.threshold
) dynamic tables to process simultaneously.First of all, the mistake is very confusing; if any kind of a safe limit prevents a query from executing, it should be clear which limit is exceeded and (ideally) which parameter is responsible for this limit.
Second, it is not clear why dynamic tables are any different from static tables here.
Third, it is unclear how is this issue related to the (opt-in) feature of using
partition_tables
call for slicing both static and dynamic tables. Since this option erases any distinction between static and dynamic tables, it looks like enabling it either fixes both static and dynamic cases, or breaks them simultaneously.The text was updated successfully, but these errors were encountered: