New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise error on empty in_ #283
Conversation
I think it would be good idea to add error on empy in_ usage in version 1.1.* Because current implementation is not fully correct: Firstly, it's not optimised, because of sequential scan Secondly, in current implementation null in () will return null, when in sql it should return False, try this: select null in (select 1 from some_table where false); -> false but not null I understant that this can break some legacy code, but current implementation is obscure and quite expensive
this raises a warning which you should promote to be an exception. how to do this is described at https://docs.python.org/3/library/warnings.html. Making this raise for everyone right now would be disruptive to small-data applications that aren't impacted by performance here, yet would require difficult reorganization of their queries to not include empty results within. Each major release e.g. 1.0, 1.1 etc already produces a lot of disruptions for people usually due to uncaught behavioral regressions and throwing in forcing this to be an error in all cases would be too much at this point. I would favor any number of ways to customize what empty in() does but all of them would require the user does something codewise to make it happen. |
Thanks, I've missed about Warning customization. But one last question, why you decide that |
there's no such thing as "empty in" on the database so I'm not sure how you can say what "NULL IN ()" should evaluate to. NULL means "unknown" which is why things like "NULL = NULL" return NULL, not False. For "NULL in ()", who knows, since relational databases refuse to even evaluate "1 IN ()". The expression we have here is the closest thing we could get to as many cases as possible which you can read about here: http://docs.sqlalchemy.org/en/rel_1_0/faq/sqlexpressions.html#why-does-col-in-produce-col-col-why-not-1-0 . I think beyond that we'd have to render an enormous CASE statement which we'd rather not get into. Feel free to suggest an expression that covers all the cases in that FAQ entry more effectively on every database. this is really old stuff. |
Yes, you are right, we can't write like this I will appreciate, if you can comment above sql expression. I'd like to clarify some aspects of algebra:
All this means that there is no such element that could consist in an empty set. If you will change Based on all of the above, I think that it would be a good idea to change And of course, emitting warnings is useful stuff. |
interestingly you might have a good way for us to get an empty set into IN there.
I don't.
none of that matters because relational databases don't support this case anyway and we are only trying to approximate how close we can get.
it does not preserve algebra. Please read http://docs.sqlalchemy.org/en/rel_1_0/faq/sqlexpressions.html#why-does-col-in-produce-col-col-why-not-1-0. |
heres where this is at. I will see if i can find the original discussion from about 8 years ago:
|
the longest discussion on this is on this one: https://bitbucket.org/zzzeek/sqlalchemy/issues/1628 it comes down to: "NULL NOT IN ()" True, False, or NULL ? |
` Mike, I've read this article (http://docs.sqlalchemy.org/en/rel_1_0/faq/sqlexpressions.html#why-does-col-in-produce-col-col-why-not-1-0.) after your previous comment. Single example, that I've found was: the case is when column = null So, as I can see, changing I'm sorry if I've missed something and will appreciate if you explain what If you want to use |
and here is why it is NULL:
this is why we ten consider NULL NOT IN () to also be NULL. But, perhaps that's wrong. If NULL means, "might be 1, 2 or 3, who knows?" and that's why it returns NULL, then maybe "NULL NOT IN ()" is true. I really don't think this is a cut and dry issue. |
if we decide "NULL NOT IN ()" is True then we really could say "1 != 1" and remove the warning, I think. |
Consider such example: Consider warning: Anyway, you are the owner, and you'll decide how it should be, I've just found some misbehaviour, and tried to fix it. As for me, I've promoted this warning to error in debug mode, and now debug code to remove all possible situations with empty in. |
I think the rationale for "NULL NOT IN ()" is NULL was because "NULL NOT IN (1, 2, 3)" is NULL. Still, you get very different results with this change. I'm going to run it through a few openstack tests at https://gerrit.sqlalchemy.org/#/c/103/ and I doubt anything in openstack hits this because they likely avoid the warning. |
Ok, let's see what it'll be. |
Cool |
hopefully not any later than early summer, more hopefully sometime in the spring. the branch is not even set up yet. |
Ok, I'll looking forward it |
I think it would be good idea to add error on empy in_ usage in version 1.1.*
Because current implementation is not fully correct:
Firstly, it's not optimised, because of sequential scan
Secondly, in current implementation null in () will return null, when in sql it should return False, try this:
select null in (select 1 from some_table where false); -> false but not null
I understant that this can break some legacy code, but current implementation is obscure and quite expensive