Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] BNL queries on hash sharded inner tables can result in big memory usage #17762

Closed
1 task done
tanujnay112 opened this issue Jun 13, 2023 · 0 comments
Closed
1 task done
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@tanujnay112
Copy link
Contributor

tanujnay112 commented Jun 13, 2023

Jira Link: DB-6857

Description

Consider the following queries where g1 and g2 are hash sharded tables while g1range and g2range are range sharded.

create table g1(h int primary key);
create table g2(h int primary key);
insert into g1 select generate_series(1,100000);
insert into g2 select generate_series(1,100000)
create table g1range(r int, primary key(r asc));
create table g2range(r int, primary key(r asc));
insert into g1range select generate_series(1,100000);
insert into g2range select generate_series(1,100000);
analyze;
set yb_bnl_batch_size = 1024;
set yb_enable_optimizer_statistics=true;

We see that a BNL on the range sharded tables results in the following in a release build

/*+Set(yb_bnl_batch_size 1024) Leading((t1 t2)) NestLoop(t1 t2)**/explain analyze select count(*) from g1range t1, g2range t2 where t1.r = t2.r;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11909.46..11909.47 rows=1 width=8) (actual time=546.562..546.562 rows=1 loops=1)
   ->  YB Batched Nested Loop Join  (cost=8.00..11659.46 rows=100000 width=0) (actual time=5.035..533.499 rows=100000 loops=1)
         Join Filter: (t1.r = t2.r)
         ->  Seq Scan on g1range t1  (cost=4.00..10004.00 rows=100000 width=4) (actual time=1.398..23.949 rows=100000 loops=1)
         ->  Index Scan using g2range_pkey on g2range t2  (cost=4.00..4.11 rows=1 width=4) (actual time=3.126..3.542 rows=1020 loops=98)
               Index Cond: (r = ANY (ARRAY[t1.r, $1, $2, ..., $1023]))
 Planning Time: 9.215 ms
 Execution Time: 547.155 ms
 Peak Memory Usage: 1074 kB
(9 rows)

However, we do the same join on the hash sharded tables to result in being >15x slower and >100x more memory usage:

/*+Set(yb_bnl_batch_size 1024) Leading((t1 t2)) NestLoop(t1 t2)*/explain analyze select count(*) from g1 t1, g2 t2 where t1.h = t2.h;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11909.46..11909.47 rows=1 width=8) (actual time=7761.028..7761.028 rows=1 loops=1)
   ->  YB Batched Nested Loop Join  (cost=8.00..11659.46 rows=100000 width=0) (actual time=145.027..7748.264 rows=100000 loops=1)
         Join Filter: (t1.h = t2.h)
         ->  Seq Scan on g1 t1  (cost=4.00..10004.00 rows=100000 width=4) (actual time=2.036..27.689 rows=100000 loops=1)
         ->  Index Scan using g2_pkey on g2 t2  (cost=4.00..4.11 rows=1 width=4) (actual time=76.028..76.442 rows=1020 loops=98)
               Index Cond: (h = ANY (ARRAY[t1.h, $1, $2, ..., $1023]))
 Planning Time: 6.510 ms
 Execution Time: 7761.942 ms
 Peak Memory Usage: 117611 kB
(9 rows)

It seems like the rescanned IN queries to the hash sharded tables take up way more memory and, in effect, time than the equivalent query with range sharded tables. This needs to be investigated.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@tanujnay112 tanujnay112 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jun 13, 2023
@tanujnay112 tanujnay112 self-assigned this Jun 13, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Jun 13, 2023
tanujnay112 added a commit that referenced this issue Jun 14, 2023
…or IN conditions sent to hash sharded tables

Summary: Before this change we would create `min(permutation_count, FLAGS_ysql_request_limit)` clones of the original read request that contains an IN condition to send over to hash sharded tables. Having these many clones was relevant before hash batching introduced in fc57665 where we would send over one request per IN element. However, when we are batching our IN elements, we only need one op per remote partition. Anything extra results in a lot of unnecessary memory and cpu usage during cloning. This diff fixes this inefficiency.

Test Plan:
Jenkins

The scenario in the GHI has been tested. We get that the join on the hash sharded tables takes up 762ms and 1648kB while the equivalent one on range sharded tables takes up 800ms and 1074kB.

Reviewers: amartsinchyk

Reviewed By: amartsinchyk

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D26134
tanujnay112 added a commit that referenced this issue Jun 14, 2023
…of partitions for IN conditions sent to hash sharded tables

Summary:
Before this change we would create `min(permutation_count, FLAGS_ysql_request_limit)` clones of the original read request that contains an IN condition to send over to hash sharded tables. Having these many clones was relevant before hash batching introduced in fc57665 where we would send over one request per IN element. However, when we are batching our IN elements, we only need one op per remote partition. Anything extra results in a lot of unnecessary memory and cpu usage during cloning. This diff fixes this inefficiency.
Original commit: d87a628 / D26134

Test Plan:
Jenkins

The scenario in the GHI has been tested. We get that the join on the hash sharded tables takes up 762ms and 1648kB while the equivalent one on range sharded tables takes up 800ms and 1074kB.

Reviewers: amartsinchyk

Reviewed By: amartsinchyk

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D26197
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants