-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ysql] Cannot create index on "range" column. Cannot include "range" column in PK constraint. #7353
Comments
See
|
This seems to be related to #6606 |
Interesting. I wouldn't know (from the outside). |
Sadly there is also no good way (that I've found) to trick yugabyte into using an index with the range operators. You can create an index on the This is a real pain point if you want to speed up these kinds of queries because the workaround is to rebuild the well defined operators yourself and hope for the best. I'm using create table test (
range int4range
);
create index testr
on test (lower(range) desc, upper(range) desc);
insert into test (range)
select int4range(series.A, series.A + 5)
from generate_series(0, 1000000, 5) as series(A);
explain analyse select * from test where range @> 190423;
--+------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+------------------------------------------------------------------------------------------------------+
--|Seq Scan on test (cost=0.00..102.50 rows=1000 width=32) (actual time=111.671..753.082 rows=1 loops=1)|
--| Filter: (range @> 190423) |
--| Rows Removed by Filter: 200000 |
--|Planning Time: 0.032 ms |
--|Execution Time: 753.135 ms |
--+------------------------------------------------------------------------------------------------------+
explain analyse select * from test where lower(range) <= 190423 and upper(range) > 190423;
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+--------------------------------------------------------------------------------------------------------------+
--|Index Scan using testr on test (cost=0.00..5.30 rows=10 width=32) (actual time=37.187..37.191 rows=1 loops=1)|
--| Index Cond: ((lower(range) <= 190423) AND (upper(range) > 190423)) |
--|Planning Time: 0.538 ms |
--|Execution Time: 37.251 ms |
--+--------------------------------------------------------------------------------------------------------------+
What is that like a 95% speedup for not using the operator? Easy choice.....but wait! Those aren't really equivalents, now, are they?? As far as I can currently think, this is a closer equivalent: explain analyse select * from test
where
-- cover all 4 inc/exclusive possibilities where both sides of the range are not null
(lower_inc(range) and not upper_inc(range) and lower(range) <= 190423 and upper(range) > 190423) -- [,)
or (lower_inc(range) and upper_inc(range) and lower(range) <= 190423 and upper(range) >= 190423) -- [,]
or (not lower_inc(range) and upper_inc(range) and lower(range) < 190423 and upper(range) >= 190423) -- (,]
or (not lower_inc(range) and not upper_inc(range) and lower(range) < 190423 and upper(range) > 190423) -- (,)
-- and the 2 where lower is null
or (not upper_inc(range) and lower(range) is null and upper(range) > 190423) -- ,)
or (upper_inc(range) and lower(range) is null and upper(range) >= 190423) -- ,]
-- and the 2 where upper is null
or (not lower_inc(range) and lower(range) < 190423 and upper(range) is null) -- (,
or (lower_inc(range) and lower(range) <= 190423 and upper(range) is null) -- [,
-- and where both are null
or (lower(range) is null and upper(range) is null); It would be funny if it wasn't so sad. Of course, this query no longer uses the index.
But that's okay, it's the explain analyse
select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= 190423 and upper(range) > 190423
union all
select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= 190423 and upper(range) >= 190423
union all
select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < 190423 and upper(range) >= 190423
union all
select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < 190423 and upper(range) > 190423
union all
select * from test where not upper_inc(range) and lower(range) is null and upper(range) > 190423
union all
select * from test where upper_inc(range) and lower(range) is null and upper(range) >= 190423
union all
select * from test where not lower_inc(range) and lower(range) < 190423 and upper(range) is null
union all
select * from test where lower_inc(range) and lower(range) <= 190423 and upper(range) is null
union all
select * from test where lower(range) is null and upper(range) is null;
Performance took a hit but we are still running circles around the non indexed operator. Sadly this doesn't even handle the case where you are checking if a range contains another range.That's okay let's keep going create or replace function contains(value int4)
returns setof test
language plpgsql
as
$$
begin
return query select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= value and upper(range) > value;
return query select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= value and upper(range) >= value;
return query select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < value and upper(range) >= value;
return query select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < value and upper(range) > value;
return query select * from test where not upper_inc(range) and lower(range) is null and upper(range) > value;
return query select * from test where upper_inc(range) and lower(range) is null and upper(range) >= value;
return query select * from test where not lower_inc(range) and lower(range) < value and upper(range) is null;
return query select * from test where lower_inc(range) and lower(range) <= value and upper(range) is null;
return query select * from test where lower(range) is null and upper(range) is null;
return;
end;
$$;
create or replace function contains(value int4range)
returns setof test
language plpgsql
as
$$
begin
return query select * from test where lower_inc(range) and not upper_inc(range) and lower(range) <= lower(value) and upper(range) > upper(value);
return query select * from test where lower_inc(range) and upper_inc(range) and lower(range) <= lower(value) and upper(range) >= upper(value);
return query select * from test where not lower_inc(range) and upper_inc(range) and lower(range) < lower(value) and upper(range) >= upper(value);
return query select * from test where not lower_inc(range) and not upper_inc(range) and lower(range) < lower(value) and upper(range) > upper(value);
return query select * from test where not upper_inc(range) and lower(range) is null and upper(range) > upper(value);
return query select * from test where upper_inc(range) and lower(range) is null and upper(range) >= upper(value);
return query select * from test where not lower_inc(range) and lower(range) < lower(value) and upper(range) is null;
return query select * from test where lower_inc(range) and lower(range) <= lower(value) and upper(range) is null;
return query select * from test where lower(range) is null and upper(range) is null;
return;
end;
$$;
Looking good except that you'd have to create one per table per range. Yuck! create or replace function contains(tbl anyelement, range text, value int4)
returns setof anyelement
language plpgsql
as
$$
declare
r text;
begin
r := quote_ident(range);
return query execute format('select * from %s where lower_inc(%s) and not upper_inc(%s) and lower(%s) <= %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, value, r, value);
return query execute format('select * from %s where lower_inc(%s) and upper_inc(%s) and lower(%s) <= %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value, r, value);
return query execute format('select * from %s where not lower_inc(%s) and upper_inc(%s) and lower(%s) < %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value, r, value);
return query execute format('select * from %s where not lower_inc(%s) and not upper_inc(%s) and lower(%s) < %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, value, r, value);
return query execute format('select * from %s where not upper_inc(%s) and lower(%s) is null and upper(%s) > %L', pg_typeof(tbl), r, r, r, value);
return query execute format('select * from %s where upper_inc(%s) and lower(%s) is null and upper(%s) >= %L', pg_typeof(tbl), r, r, r, value);
return query execute format('select * from %s where not lower_inc(%s) and lower(%s) < %L and upper(%s) is null', pg_typeof(tbl), r, r, value, r);
return query execute format('select * from %s where lower_inc(%s) and lower(%s) <= %L and upper(%s) is null', pg_typeof(tbl), r, r, value, r);
return query execute format('select * from %s where lower(%s) is null and upper(%s) is null', pg_typeof(tbl), r, r);
return;
end;
$$;
create or replace function contains(tbl anyelement, range text, value int4range)
returns setof anyelement
language plpgsql
as
$$
declare
r text;
begin
r := quote_ident(range);
return query execute format('select * from %s where lower_inc(%s) and not upper_inc(%s) and lower(%s) <= %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
return query execute format('select * from %s where lower_inc(%s) and upper_inc(%s) and lower(%s) <= %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
return query execute format('select * from %s where not lower_inc(%s) and upper_inc(%s) and lower(%s) < %L and upper(%s) >= %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
return query execute format('select * from %s where not lower_inc(%s) and not upper_inc(%s) and lower(%s) < %L and upper(%s) > %L', pg_typeof(tbl), r, r, r, lower(value), r, upper(value));
return query execute format('select * from %s where not upper_inc(%s) and lower(%s) is null and upper(%s) > %L', pg_typeof(tbl), r, r, r, upper(value));
return query execute format('select * from %s where upper_inc(%s) and lower(%s) is null and upper(%s) >= %L', pg_typeof(tbl), r, r, r, upper(value));
return query execute format('select * from %s where not lower_inc(%s) and lower(%s) < %L and upper(%s) is null', pg_typeof(tbl), r, r, lower(value), r);
return query execute format('select * from %s where lower_inc(%s) and lower(%s) <= %L and upper(%s) is null', pg_typeof(tbl), r, r, lower(value), r);
return query execute format('select * from %s where lower(%s) is null and upper(%s) is null', pg_typeof(tbl), r, r);
return;
end;
$$;
explain analyse
select * from contains(null::test, 'range', 190423);
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains (cost=0.25..10.25 rows=1000 width=32) (actual time=236.467..236.468 rows=1 loops=1)|
--|Planning Time: 0.022 ms |
--|Execution Time: 236.508 ms |
--+--------------------------------------------------------------------------------------------------------------+
explain analyse
select * from contains(null::test, 'range', int4range(190420, 190424, '[)'));
--+--------------------------------------------------------------------------------------------------------------+
--|QUERY PLAN |
--+--------------------------------------------------------------------------------------------------------------+
--|Function Scan on contains (cost=0.25..10.25 rows=1000 width=32) (actual time=224.615..224.616 rows=1 loops=1)|
--|Planning Time: 0.031 ms |
--|Execution Time: 224.628 ms |
--+--------------------------------------------------------------------------------------------------------------+ And the includes operator is probably the easiest one to implement! 😿 When yuga is able to index range queries send me a postcard at the Mountains of Madness where me and the ghost of Richard Snodgrass will be working on hand written range queries |
@EmiPhil, thanks for doing this study! |
Jira Link: DB-2563
Creating this for tracking purposes.
Observed using single-node YB-2.4.0.0 cluster on macOS.
The "create index" attempt causes the 0A000 error. (This is mapped to the
feature_not_supported
exception.) The error text is "INDEX on column of type 'INT4RANGE' not yet supported". Corresponding attempts using the other range data types (int8range
,numrange
,tsrange
,tstzrange
, anddaterange
) all cause the 0A000 error with data-type-specific error texts.)Similarly, this attempt (and for the other range data types):
also cause the the 0A000 error.
All of these examples run without error on vanilla Postgress 11.2.
The text was updated successfully, but these errors were encountered: