Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-filter produces incorrect HAVING clause on Pg #51

Closed
abaumhauer opened this issue Jun 30, 2014 · 4 comments
Closed

Multi-filter produces incorrect HAVING clause on Pg #51

abaumhauer opened this issue Jun 30, 2014 · 4 comments

Comments

@abaumhauer
Copy link

Non-compliant SQL is generated on filters with HAVING clause on PostgreSQL. The use of virtual column names with 'AS' in the HAVING clause results in an error:

column 'bar' does not exist. As an example:

SELECT ( "me"."type" ) AS "type", ( (SELECT COUNT( * ) FROM "bar" "me_alias" WHERE ( "me_alias"."typeid" = "me"."id" )) ) AS "bar", ( "me"."id" ) AS "id" FROM "foo" "me" GROUP BY "me"."id" HAVING "bar" > 100 LIMIT 25;

This works:

SELECT ( "me"."type" ) AS "type", ( (SELECT COUNT( * ) FROM "bar" "me_alias" WHERE ( "me_alias"."typeid" = "me"."id" )) ) AS "bar", ( "me"."id" ) AS "id" FROM "foo" "me" GROUP BY "me"."id" HAVING (SELECT COUNT( * ) FROM "bar" "me_alias" WHERE ( "me_alias"."typeid" = "me"."id" )) > 100 LIMIT 25;

It seems to be incorrectly handled in chain_Rs_req_multifilter() in Role/DbicLink2.pm

@vanstyn
Copy link
Owner

vanstyn commented Jul 4, 2014

So, this example query, with a sub-select alias identifier 'film_actors' works, including using it in the ORDER BY clause:

SELECT "me"."film_id", ( 
    SELECT COUNT( * ) 
      FROM "film_actor" "me_alias" 
    WHERE "me_alias"."film_id" = "me"."film_id"
   ) AS "film_actors" 
  FROM "film" "me" 
ORDER BY "film_actors" DESC 
  LIMIT '25'

So, Pg is fine with using it for ordering, but if you simply try to also use it in HAVING, it doesn't:

SELECT ( "me"."film_id" ) AS "film_id", ( 
    SELECT COUNT( * ) 
      FROM "film_actor" "me_alias" 
    WHERE "me_alias"."film_id" = "me"."film_id"
   ) AS "film_actors" 
  FROM "film" "me" 
GROUP BY "me"."film_id" 
HAVING "film_actors" > '2' 
ORDER BY "film_actors" DESC 
  LIMIT '25'

Barfing with:

ERROR:  column "film_actors" does not exist
LINE 8: HAVING "film_actors" > '2' 

The logic of the Pg people in deciding ORDER BY knows what it is, but HAVING doesn't, makes no sense to me. Feels like a Pg bug (since MySQL and SQLIte support this just fine). Obviously this is just my biased/limited perspective, and I'm sure the Pg people are following some SQL standards rule, but I think it is silly in this case.

So, it seems that the only way to make it do the right thing is to supply the whole sub-select twice in the same query, which I think is dumb:

SELECT ( "me"."film_id" ) AS "film_id", ( 
    SELECT COUNT( * ) 
      FROM "film_actor" "me_alias" 
    WHERE "me_alias"."film_id" = "me"."film_id"
   ) AS "film_actors" 
  FROM "film" "me" 
GROUP BY "me"."film_id" 
HAVING ( 
    SELECT COUNT( * ) 
      FROM "film_actor" "me_alias" 
    WHERE "me_alias"."film_id" = "me"."film_id"
   ) > '2' 
ORDER BY "film_actors" DESC 
  LIMIT '25'

[end of rant]

@scottwalters
Copy link
Contributor

In Pg's brain, the select happens after the from, group by, having, and order by.

There are a few solutions here:

Move the subquery into the from block. It can take a dependent value there too (me.film_id).

Use a 'with' block to set up a temporary table before the query starts: http://www.postgresql.org/docs/current/static/queries-with.html ... and then join to that table.

@ribasushi
Copy link
Contributor

@vanstyn The real question is - besides ranting about it, why do you care? Neither established RDBMS is at liberty to change its scoping rules. There doesn't seem to be a way for you to get around this apart doubling the subquery and moving on.

Now, why you are using HAVING there is an entirely different question... Will need to chat about this later (perhaps Sun.)

@vanstyn
Copy link
Owner

vanstyn commented Jul 5, 2014

Ok, so, just to give some background and explanation of what is going on here, these sub-selects are what are internally produced to represent "virtual columns" - which include both multi-rel columns (i.e. has_many, where the sub-select is a count) and other, arbitrary virtual columns which are supported by RapidApp.

http://www.rapidapp.info/demos/chinook/2_5 #<-- virtual columns demo

The reason I am doing a HAVING instead of a WHERE is to support conditions on these virtual-columns (i.e. built in the query-builder), because that is supposed to be (or so I thought) the whole point of HAVING -- applying additional conditions after the grouping/aggregations... Now that it appears I am going to be forced to supply the sub-select again anyway, i might as well just do it in the WHERE and stop using HAVING altogether.

I already wrote the logic to do this, but my current problem which I ran into is an inherent limitation of SQL::Abstract, which is that the subselect (literal sql, identified as a ref/ref, etc) needs to live in the column section of the query, which in SQLA is a hashkey, and hashkeys in perl can't directly contain a ref... After digging back into this, I think that this SQLA problem was the actual reason I did this whole HAVING thing to begin with, because it allowed me to assign an alias identifier to represent the sub-query (in the SELECT), and then use that identifier as a column name in HAVING (since it couldn't be used in WHERE). This worked great in MySQL and SQLite, and had the additional benefit of allowing me to supply the sub-select only once... win/win (in fact, I was under the impression that this was one of the main points that HAVING even existed). But, now I know that is not true in Pg, so I am back to this original problem...

This general problem domain has been in my sub-conscious for a while now, and in, fact, is what I was trying to ask about during riba's talk last week, before this bug was even discovered:

https://www.youtube.com/watch?v=63_MGcsXEzs#t=2153

(I actually didn't really know what I was asking, but now I realize was about this and I just didn't know it, which is why my question sort of didn't fully make sense at the time)

So, I haven't yet figured out what to do to solve this... This code is a number of years old, and it is planned to be refactored along with the rest of this "DbicLink" layer... But I was hoping to be able to wait to do that, and find a quick solution to this...

Regarding scrottie's suggestion, it would involve a major structural change as well (i.e. mucking with other parts of the query that were generated far away and dependent on many other moving parts). Also, I'm guessing that these other techniques such as WITH, etc, will have the same cross-compatibly issues that I'm experiencing now with HAVING... It works fine on MySQL/SQLT, but not Pg... writing something that works on Pg but not MySQL is just as bad... Of course, I could start introspecting the backend db and create different implementations for different RDBMS'es, but that will open a whole other can of worms, code to be supported/maintained, test coverages, etc etc etc. I'm sure we will be doing this eventually, but I want to put it off as long as I can...

So, my current thought is that the best route is to go ahead and do the sub-select in the condition. I just need to figure out how to get around this SQLA structural limitation...

Note: there has also been some convos about this SQLA thing with frew/mst in IRC in #dbix-class …

vanstyn added a commit that referenced this issue Oct 20, 2014
Per ribasushi:

 ...
 <ribasushi> change { ' ' => $r } to { '  ' => $r } (note the space)
 <ribasushi> it's fucking horrible and retarded but you already knew
 that
vanstyn added a commit that referenced this issue Oct 30, 2014
vanstyn added a commit that referenced this issue Jul 6, 2015
Grid Column Summaries are now available by default for all types of
columns, including virtual and multi-rel columns, as well as single rel
columns with virtual display columns. This has been structurally
possible for a long time, but disabled by rule since before other
significant refactoring, such as unfactor of special HAVING logic
(see #51) and other changes. Since that time, this has been much
easier to do.

Besides simply removing the rule which disallowed column summaries (
'no_summary' column property) some special value handling needed to be
added to handle the ref structure of virtual columns which has been
done, as well as cleanup and simplification of related code and now
dead codepaths.

Also did some minor changes to allow renderers for multi-rel columns to
behave better when rendered as a named summary func, and added new
column profile 'multirel' which is used internally set to multi-rel
column opts for the summary funcs (as number) as well as the other
multi-rel column opts in a cleaner manner.

The only type of column for which column summaries are still disabled
is m2m, which are a special case and probably make no sense to have a
column summary at all, however, this may still revisit in the future...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants