Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error on empty in_ #283

Closed
wants to merge 1 commit into from
Closed

Raise error on empty in_ #283

wants to merge 1 commit into from

Conversation

erebus1
Copy link

@erebus1 erebus1 commented Jun 5, 2016

I think it would be good idea to add error on empy in_ usage in version 1.1.*
Because current implementation is not fully correct:
Firstly, it's not optimised, because of sequential scan
Secondly, in current implementation null in () will return null, when in sql it should return False, try this:
select null in (select 1 from some_table where false); -> false but not null

I understant that this can break some legacy code, but current implementation is obscure and quite expensive


I think it would be good idea to add error on empy in_ usage in version 1.1.*
Because current implementation is not fully correct:
Firstly, it's not optimised, because of sequential scan
Secondly, in current implementation null in () will return null, when in sql it should return False, try this:
select null in (select 1 from some_table where false); -> false but not null

I understant that this can break some legacy code, but current implementation is obscure and quite expensive
@zzzeek
Copy link
Owner

zzzeek commented Jun 6, 2016

this raises a warning which you should promote to be an exception. how to do this is described at https://docs.python.org/3/library/warnings.html. Making this raise for everyone right now would be disruptive to small-data applications that aren't impacted by performance here, yet would require difficult reorganization of their queries to not include empty results within. Each major release e.g. 1.0, 1.1 etc already produces a lot of disruptions for people usually due to uncaught behavioral regressions and throwing in forcing this to be an error in all cases would be too much at this point. I would favor any number of ways to customize what empty in() does but all of them would require the user does something codewise to make it happen.

@zzzeek zzzeek closed this Jun 6, 2016
@erebus1
Copy link
Author

erebus1 commented Jun 6, 2016

Thanks, I've missed about Warning customization.

But one last question, why you decide that null in () = null,
Isn't it better to change expr != expr on 1!=1 or something like this.
I think you've meant that null can be an empty set and by boolean logic empty set is contained in each empty set.
But as I can see, databases do not share your opinion and return false instead of null.
I'll be very grateful if you provide some explanation why you have chosen such strategy or some links on such kind of explanation.

@zzzeek
Copy link
Owner

zzzeek commented Jun 6, 2016

there's no such thing as "empty in" on the database so I'm not sure how you can say what "NULL IN ()" should evaluate to. NULL means "unknown" which is why things like "NULL = NULL" return NULL, not False. For "NULL in ()", who knows, since relational databases refuse to even evaluate "1 IN ()". The expression we have here is the closest thing we could get to as many cases as possible which you can read about here: http://docs.sqlalchemy.org/en/rel_1_0/faq/sqlexpressions.html#why-does-col-in-produce-col-col-why-not-1-0 . I think beyond that we'd have to render an enormous CASE statement which we'd rather not get into. Feel free to suggest an expression that covers all the cases in that FAQ entry more effectively on every database. this is really old stuff.

@erebus1
Copy link
Author

erebus1 commented Jun 7, 2016

Yes, you are right, we can't write like this 1 in () in SQL
But we can run such code: select null in (select 1 from some_table where false); which is nearly the same.
And in such case db will return false.
I can't fully understand why you consider, that null in () should be equal to null (undefined)?

I will appreciate, if you can comment above sql expression.

I'd like to clarify some aspects of algebra:

The empty set is the unique set having no elements

The empty set is a trivial subset of itself, but not a member of itself

Cardinality of empty set is zero

All this means that there is no such element that could consist in an empty set.
So even consider that null can be any element it could not consist in an empty set.

If you will change expr!=expr on 1!=1 it will save all algebra rules and decrease DB load

Based on all of the above, I think that it would be a good idea to change expr!=expr on 1!=1
because current implementation breaks algebra rules and makes useless work.

And of course, emitting warnings is useful stuff.

@zzzeek
Copy link
Owner

zzzeek commented Jun 7, 2016

select null in (select 1 from some_table where false)

interestingly you might have a good way for us to get an empty set into IN there.

I can't fully understand why you consider, that null in () should be equal to null (undefined)?

I don't.

All this means that there is no such element that could consist in an empty set.
So even consider that null can be any element it could not consist in an empty set.

none of that matters because relational databases don't support this case anyway and we are only trying to approximate how close we can get.

If you will change expr!=expr on 1!=1 it will save all algebra rules and decrease DB load

it does not preserve algebra. Please read http://docs.sqlalchemy.org/en/rel_1_0/faq/sqlexpressions.html#why-does-col-in-produce-col-col-why-not-1-0.

@zzzeek
Copy link
Owner

zzzeek commented Jun 7, 2016

heres where this is at. I will see if i can find the original discussion from about 8 years ago:

test=# create table t (x integer, y integer);
CREATE TABLE

test=# insert into t (x, y) values (1, 1), (2, 2), (3, 3), (4, NULL);
INSERT 0 4


-- 1. integer version:
-- SELECT * FROM t WHERE y IN ()
test=# select * from t where 1 != 1;
 x | y 
---+---
(0 rows)

-- 2. column version:
-- SELECT * FROM t WHERE y IN () 
test=# select * from t where y != y;
 x | y 
---+---
(0 rows)

-- 3. integer version - is NULL NOT IN the empty set?
-- however, your idea of (SELECT 1 WHERE false) does suggest it is
-- SELECT * FROM t WHERE y NOT IN ()
test=# select * from t where not 1 != 1;
 x | y 
---+---
 1 | 1
 2 | 2
 3 | 3
 4 |  
(4 rows)

-- 4. column version - different answer.  
-- SELECT * FROM t WHERE y NOT IN () 
test=# select * from t where not y != y;
 x | y 
---+---
 1 | 1
 2 | 2
 3 | 3
(3 rows)

@zzzeek
Copy link
Owner

zzzeek commented Jun 7, 2016

the longest discussion on this is on this one: https://bitbucket.org/zzzeek/sqlalchemy/issues/1628

it comes down to:

"NULL NOT IN ()"

True, False, or NULL ?

@erebus1
Copy link
Author

erebus1 commented Jun 7, 2016

`

Mike, I've read this article (http://docs.sqlalchemy.org/en/rel_1_0/faq/sqlexpressions.html#why-does-col-in-produce-col-col-why-not-1-0.) after your previous comment.

Single example, that I've found was:
column not in ()
which SQL alchemy translate in
column != column

the case is when column = null
Algebra version:
null not in () -> True
SqlAlchemy version:
not null != null -> null
Naive version:
not 1 != 1 -> True

So, as I can see, changing expr != expr on 1!=1 will only repair algebra logic.

I'm sorry if I've missed something and will appreciate if you explain what

If you want to use (select x where false)
you should take into account that you should conform type of x to type of column.
Because select true in (select 1 where False)' will raise error, you should writeselect true in (select true where false)'
but any case this seems like over-engineering, and using 1!=1 with emitting warning seems pretty good (IMHO)

@zzzeek
Copy link
Owner

zzzeek commented Jun 7, 2016

and here is why it is NULL:

 # select NULL NOT IN (1, 2, 3);
 ?column? 
----------

(1 row)

this is why we ten consider NULL NOT IN () to also be NULL. But, perhaps that's wrong. If NULL means, "might be 1, 2 or 3, who knows?" and that's why it returns NULL, then maybe "NULL NOT IN ()" is true. I really don't think this is a cut and dry issue.

@zzzeek
Copy link
Owner

zzzeek commented Jun 7, 2016

if we decide "NULL NOT IN ()" is True then we really could say "1 != 1" and remove the warning, I think.

@erebus1
Copy link
Author

erebus1 commented Jun 7, 2016

Consider such example:
'select NULL NOT IN (1, 2, 3);'
It's wrong proving, and it seems I've provided enough mathematical and practical explanation why (empty set has no elements).

Consider warning:
Even if we change expr != expr on 1!=1 it will decrease DB load, but still emit useless query.
So warning can be useful for someone.

Anyway, you are the owner, and you'll decide how it should be, I've just found some misbehaviour, and tried to fix it.

As for me, I've promoted this warning to error in debug mode, and now debug code to remove all possible situations with empty in.
But I'll be glad to see a new version of SQL alchemy without such obscure bottleneck.

@zzzeek
Copy link
Owner

zzzeek commented Jun 7, 2016

I think the rationale for "NULL NOT IN ()" is NULL was because "NULL NOT IN (1, 2, 3)" is NULL.

Still, you get very different results with this change. I'm going to run it through a few openstack tests at https://gerrit.sqlalchemy.org/#/c/103/ and I doubt anything in openstack hits this because they likely avoid the warning.

@erebus1
Copy link
Author

erebus1 commented Jun 7, 2016

Ok, let's see what it'll be.
Thanks for the good discussion.

@zzzeek
Copy link
Owner

zzzeek commented Feb 8, 2017

@erebus1
Copy link
Author

erebus1 commented Feb 28, 2017

Cool
When do you plan to release 1.2 version?

@zzzeek
Copy link
Owner

zzzeek commented Feb 28, 2017

hopefully not any later than early summer, more hopefully sometime in the spring. the branch is not even set up yet.

@erebus1
Copy link
Author

erebus1 commented Feb 28, 2017

Ok, I'll looking forward it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants