Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic filtering #52

Closed
sopel39 opened this issue Jan 24, 2019 · 8 comments
Closed

Dynamic filtering #52

sopel39 opened this issue Jan 24, 2019 · 8 comments
Labels
enhancement New feature or request roadmap Top level issues for major efforts in the project

Comments

@sopel39
Copy link
Member

sopel39 commented Jan 24, 2019

fact_table - large fact table, partitioned on date_key
d_date - small dimension table, with significant correlation on it's columns and date_key.

Currently for the following query:

SELECT 
    count(*)
FROM 
    fact_table a 
JOIN d_date b 
ON a.date_key = b.date_key 
WHERE 
    b.year = 2017;

larger probe table is fully scanned. When presto after scaning build table (right) could extract information that for b.year = 2017 there are only small number of matching b.date_key values. This information could be used to narrow down the table scan. In other words, above query could be dynamically "rewritten" to equivalent of:

SELECT 
    count(*)
FROM 
    fact_table a 
JOIN d_date b 
ON a.date_key = b.date_key 
WHERE a.date_key IN (20170101,20170102, etc...)

Design doc: https://docs.google.com/document/d/1TOlxS8ZAXSIHR5ftHbPsgUkuUA-ky-odwmPdZARJrUQ/edit

@raunaqmorarka
Copy link
Member

@sopel39 I would like to pick up the work of merging dynamic filtering from where @amoghmargoor last left it. I will raise the PR again by rebasing original one. Can this issue be assigned to me ?

@sopel39
Copy link
Member Author

sopel39 commented Jan 28, 2019

I will raise the PR again by rebasing original one.

I think we should pick up from prestodb/presto#11604 instead

@raunaqmorarka
Copy link
Member

raunaqmorarka commented Mar 13, 2019

  1. Introduce config feature flag (disabled by default) which will guard new dynamic-filtering related code.
  2. Introduce collecting DynamicFilterSummary on build side of Join.
    prestodb/presto@ac6e9b4 - Introduce DynamicFilterSummary
    prestodb/presto@5717486 - Collect DynamicFilterSummary before HashBuilder
  3. Introduce DynamicFilterExpression and optimizer rule which adds it in Filter below probe side of INNER join.
    prestodb/presto@5622cb0 - Add DynamicFilter
    prestodb/presto@fdc44e0 - Add ApplyDynamicFilters
  4. Introduce a rule for pruning out stale DynamicFilterExpression for joins which are no longer in the plan. Or which are in unsupported place in plan.
    prestodb/presto@fdc44e0 - Add ApplyDynamicFilters
  5. Expose endpoint on coordinator for collecting DynamicFilterSummaries from worker nodes
    prestodb/presto@f5c5984 - Add DynamicFilterResource
  6. Feed coordinator with DynamicFilterSummaries from worker nodes.
    prestodb/presto@f5c5984 - Add DynamicFilterResource
    prestodb/presto@64ccbd3 - Add InMemoryDynamicFilterClient
  7. Use DynamicFilterSummaries collected on coordinator to filter out splits in SplitManager.
    prestodb/presto@b6d5318 - Register tasks in DynamicFilterService
    prestodb/presto@f705252 - Add DynamicFilterDescription
    prestodb/presto@9bae02f - Pass Future with DynamicFilterDescription to the SplitManager
    prestodb/presto@03fa054 - Add dynamic partition pruning to Hive connector

@findepi findepi added the roadmap Top level issues for major efforts in the project label Mar 13, 2019
@dain dain changed the title Land dynamic filtering Dynamic filtering Apr 22, 2019
@raunaqmorarka
Copy link
Member

Opened PR #1072 for implementing dynamic partition pruning in hive connector.
Dynamic row filtering will be implemented in a subsequent PR.

@sopel39
Copy link
Member Author

sopel39 commented Jun 10, 2020

Fixed via: #1072

@sopel39 sopel39 closed this as completed Jun 10, 2020
@findepi
Copy link
Member

findepi commented Jun 10, 2020

🎉 🚀

@jinwangapple
Copy link

Hi all,

I have encountered an issue using dynamic filtering and passing a list/tuple into the where clause, and I am not very experienced in python so please help me.

I am using conn_hms = trino.dbapi.connect to HMS table, and I define the read function below:

def read_hms(conn_hms, q, start_day, end_day, sf, query):
    cur = conn_hms.cursor()
    cur.execute(q, params=[start_day, end_day, sf, query])
    rows = cur.fetchall()
    column_names = [desc[0] for desc in cur.description]
    data = pd.DataFrame(rows, columns=column_names)
    return data

so i will pass in start/end day, sf, and query into the sql later (query parameter is the list/tuple i want to filter in sql)

sf_choice = str(market.value)
start = start_date.value.strftime("%Y-%m-%d")
end = end_date.value.strftime("%Y-%m-%d")

genre_converted = """
            SELECT query
                  ,kind_type
                  ,COUNT(*) AS frequencies
            FROM
                table abc
            WHERE
                DATE_TRUNC('day', date)>= from_iso8601_date(?) and DATE_TRUNC('day', date)<= from_iso8601_date(?)
                AND market = ?
                AND query in ?
            GROUP BY 1,2
            order by query, COUNT(*) desc
        """
top_genre = read_hms(conn_hms, genre_converted, start, end, sf_choice, queries)

Until the point i make query a list/ tuple, this works, but now they are returning syntex error like "TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 14:27: mismatched input 'in'. Expecting: 'AND', 'EXCEPT', 'FETCH', 'GROUP', 'HAVING', 'INTERSECT', 'LIMIT', 'OFFSET', 'OR', 'ORDER', 'UNION', 'WINDOW', ", query_id=20220228_004211_00817_et3jr)".

Could you please help me on this?

@hackeryang
Copy link
Member

hackeryang commented Jun 6, 2023

Related video: https://www.youtube.com/watch?v=-2SjSM73l8g
Related blog: https://trino.io/blog/2019/06/30/dynamic-filtering.html
https://trino.io/episodes/11.html

Use BloomFilter in dynamic filters in the future will reduce OOM errors in the coordinator(see config enable-large-dynamic-filters) and improve the filtration accuracy: #13917

PRs: https://github.com/trinodb/trino/pulls?page=9&q=is%3Apr+is%3Aclosed+dynamic+filter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap Top level issues for major efforts in the project
Development

No branches or pull requests

5 participants