Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bitmap pushdown for filtering operation with column index #126

Closed
wants to merge 2 commits into from

Conversation

silver-ymz
Copy link
Member

WIP

close #116

Current problem is ERROR: variable not found in subplan target list in building plan process. Need to debug.

Signed-off-by: silver-ymz <yinmingzhuo@gmail.com>
Signed-off-by: silver-ymz <yinmingzhuo@gmail.com>
@usamoi
Copy link
Collaborator

usamoi commented Nov 10, 2023

How to create a query using bitmap pushdown?

@silver-ymz
Copy link
Member Author

silver-ymz commented Nov 10, 2023

example to use bitmap pushdown

set vectors.enable_prefilter=on;
set vectors.enable_bitmap_pushdown=on;

CREATE TABLE products (
    id serial primary key, 
    price real,
    feature vector(3)
);

INSERT INTO products (price, feature) SELECT random(), ARRAY[random(), random(), random()]::real[] FROM generate_series(1, 5000);

CREATE INDEX ON products USING btree (price);

CREATE INDEX ON products USING vectors (feature l2_ops)
WITH (options = $$
capacity = 10000
[algorithm.hnsw]
$$);

SELECT id FROM products WHERE 
    price > 0.2 AND price <= 0.7
ORDER BY 
    feature <-> '[0.5, 0.5, 0.5]'
LIMIT 100;

It will generate 2 possible query plan

                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.00..5.50 rows=100 width=12)
   ->  Index Scan using products_feature_idx on products  (cost=0.00..27.24 rows=495 width=12)
         Order By: (feature <-> '[0.5, 0.5, 0.5]'::vector)
         Filter: ((price > '0.2'::double precision) AND (price <= '0.7'::double precision))

                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Limit  (cost=11.99..12.00 rows=5 width=12)
   ->  Sort  (cost=11.99..12.00 rows=5 width=12)
         Sort Key: ((feature <-> '[0.5, 0.5, 0.5]'::vector))
         ->  Bitmap Heap Scan on products  (cost=4.33..11.93 rows=5 width=12)
               Recheck Cond: ((price > '0.2'::double precision) AND (price <= '0.7'::double precision))
               ->  Bitmap Index Scan on products_price_idx  (cost=0.00..4.33 rows=5 width=0)
                     Index Cond: ((price > '0.2'::double precision) AND (price <= '0.7'::double precision))

Ideal plan to use bitmap pushdown will be

                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.00..5.61 rows=100 width=12)
   ->  Index Scan using products_feature_idx on products  (cost=0.00..27.21 rows=485 width=12)
         Order By: (feature <-> '[0.5, 0.5, 0.5]'::vector)
         Filter: ((price > '0.2'::double precision) AND (price <= '0.7'::double precision))
         ->  Bitmap Index Scan on products_price_idx  (cost=0.00..4.33 rows=5 width=0)
               Index Cond: ((price > '0.2'::double precision) AND (price <= '0.7'::double precision))

Current implementation is to collect vector index scan path and bitmap index scan path to custom scan path in set_rel_pathlist_hook. set_rel_pathlist_hook will be called after generating all paths, before selecting best path and casting it to plan. When postgres select custom scan path as cheapest path, it will generate plan from both vector index scan path and bitmap index scan path. And then, execute the bitmap scan and inject result into index scan state.

Now the problem is postgres will do more work about generating custom scan plan, e.x. setting plan references, dealing with scan relation. It seems that we need to find a way to bypass this process.

@usamoi
Copy link
Collaborator

usamoi commented Nov 10, 2023

The SQL does not generate 2 plans in my environment. bitmap_index_scan is None.

@silver-ymz
Copy link
Member Author

set vectors.enable_vector_index=off;
set enable_seqscan=off;

It will be more likely to generate bitmap scan.

@usamoi
Copy link
Collaborator

usamoi commented Nov 10, 2023

Can we generate bitmap scan directly from filter quals?

@silver-ymz
Copy link
Member Author

Can we generate bitmap scan directly from filter quals?

It is possible theoretically. But we need to implement almost all logic of build_index_paths. It seems quite complicated.

@VoVAllen
Copy link
Member

I didn't get why "we need to find a way to bypass this process.". Do you mean other query plan may have lower cost in estimation?

@silver-ymz
Copy link
Member Author

I didn't get why "we need to find a way to bypass this process.". Do you mean other query plan may have lower cost in estimation?

I mean we need to find a way to bypass the additional process related to scan that is done during the generation of a custom scan plan from a custom path. When postgres selects our injected custom path as cheapest path, it will generate a custom plan from it. In the generation, it will deal with lots of additional things about scan. We don't have the proper parameters to get postgres to complete the process, so it will error out.

@VoVAllen
Copy link
Member

VoVAllen commented Jan 2, 2024

Hard to implement without modifying postgres. Closed for now

@VoVAllen VoVAllen closed this Jan 2, 2024
@silver-ymz silver-ymz deleted the bitmap-push branch January 2, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: bitmap pushdown for filtering operation with column index
3 participants