Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single TupleDomain constraint for multiple splits #442

Closed
kabunchi opened this issue Mar 11, 2019 · 1 comment
Closed

single TupleDomain constraint for multiple splits #442

kabunchi opened this issue Mar 11, 2019 · 1 comment

Comments

@kabunchi
Copy link
Contributor

In case the constraint is large sending the constraint (TupleDomain) per split is wasteful (for the case the constraint is the same per split).
In our connector we have relatively large number of small splits and we've stumbled upon the use case of querying for 10K string keywords in a single query we hacked/fixed it by adding the constraint to TaskSource and then infest it to all its splits.
Does it makes sense? Is this approach acceptable? Would it be more clean to add another kind of connectorHandle per taskSource?

@martint
Copy link
Member

martint commented Mar 13, 2019

The information is contained in the TableHandle associated with the TableScanNode. What we might need to do is change the PageSourceProvider API to take the TableHandle in addition to the Split. That way, anything that's common for the whole query doesn't need to be included in every split (only what's relevant to a split).

This is going to be important to support pushdown of complex operations and whole subplans into connectors. We certainly don't want connectors to have to embed a copy of the subplan in every split! (relates to https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants