Multi-filter produces incorrect HAVING clause on Pg #51

abaumhauer · 2014-06-30T21:57:25Z

Non-compliant SQL is generated on filters with HAVING clause on PostgreSQL. The use of virtual column names with 'AS' in the HAVING clause results in an error:

column 'bar' does not exist. As an example:

SELECT ( "me"."type" ) AS "type", ( (SELECT COUNT( * ) FROM "bar" "me_alias" WHERE ( "me_alias"."typeid" = "me"."id" )) ) AS "bar", ( "me"."id" ) AS "id" FROM "foo" "me" GROUP BY "me"."id" HAVING "bar" > 100 LIMIT 25;

This works:

SELECT ( "me"."type" ) AS "type", ( (SELECT COUNT( * ) FROM "bar" "me_alias" WHERE ( "me_alias"."typeid" = "me"."id" )) ) AS "bar", ( "me"."id" ) AS "id" FROM "foo" "me" GROUP BY "me"."id" HAVING (SELECT COUNT( * ) FROM "bar" "me_alias" WHERE ( "me_alias"."typeid" = "me"."id" )) > 100 LIMIT 25;

It seems to be incorrectly handled in chain_Rs_req_multifilter() in Role/DbicLink2.pm

vanstyn · 2014-07-04T11:17:45Z

So, this example query, with a sub-select alias identifier 'film_actors' works, including using it in the ORDER BY clause:

SELECT "me"."film_id", ( 
    SELECT COUNT( * ) 
      FROM "film_actor" "me_alias" 
    WHERE "me_alias"."film_id" = "me"."film_id"
   ) AS "film_actors" 
  FROM "film" "me" 
ORDER BY "film_actors" DESC 
  LIMIT '25'

So, Pg is fine with using it for ordering, but if you simply try to also use it in HAVING, it doesn't:

SELECT ( "me"."film_id" ) AS "film_id", ( 
    SELECT COUNT( * ) 
      FROM "film_actor" "me_alias" 
    WHERE "me_alias"."film_id" = "me"."film_id"
   ) AS "film_actors" 
  FROM "film" "me" 
GROUP BY "me"."film_id" 
HAVING "film_actors" > '2' 
ORDER BY "film_actors" DESC 
  LIMIT '25'

Barfing with:

ERROR:  column "film_actors" does not exist
LINE 8: HAVING "film_actors" > '2'

The logic of the Pg people in deciding ORDER BY knows what it is, but HAVING doesn't, makes no sense to me. Feels like a Pg bug (since MySQL and SQLIte support this just fine). Obviously this is just my biased/limited perspective, and I'm sure the Pg people are following some SQL standards rule, but I think it is silly in this case.

So, it seems that the only way to make it do the right thing is to supply the whole sub-select twice in the same query, which I think is dumb:

SELECT ( "me"."film_id" ) AS "film_id", ( 
    SELECT COUNT( * ) 
      FROM "film_actor" "me_alias" 
    WHERE "me_alias"."film_id" = "me"."film_id"
   ) AS "film_actors" 
  FROM "film" "me" 
GROUP BY "me"."film_id" 
HAVING ( 
    SELECT COUNT( * ) 
      FROM "film_actor" "me_alias" 
    WHERE "me_alias"."film_id" = "me"."film_id"
   ) > '2' 
ORDER BY "film_actors" DESC 
  LIMIT '25'

[end of rant]

scottwalters · 2014-07-04T19:19:40Z

In Pg's brain, the select happens after the from, group by, having, and order by.

There are a few solutions here:

Move the subquery into the from block. It can take a dependent value there too (me.film_id).

Use a 'with' block to set up a temporary table before the query starts: http://www.postgresql.org/docs/current/static/queries-with.html ... and then join to that table.

ribasushi · 2014-07-05T12:31:11Z

@vanstyn The real question is - besides ranting about it, why do you care? Neither established RDBMS is at liberty to change its scoping rules. There doesn't seem to be a way for you to get around this apart doubling the subquery and moving on.

Now, why you are using HAVING there is an entirely different question... Will need to chat about this later (perhaps Sun.)

vanstyn · 2014-07-05T20:10:48Z

Ok, so, just to give some background and explanation of what is going on here, these sub-selects are what are internally produced to represent "virtual columns" - which include both multi-rel columns (i.e. has_many, where the sub-select is a count) and other, arbitrary virtual columns which are supported by RapidApp.

http://www.rapidapp.info/demos/chinook/2_5 #<-- virtual columns demo

The reason I am doing a HAVING instead of a WHERE is to support conditions on these virtual-columns (i.e. built in the query-builder), because that is supposed to be (or so I thought) the whole point of HAVING -- applying additional conditions after the grouping/aggregations... Now that it appears I am going to be forced to supply the sub-select again anyway, i might as well just do it in the WHERE and stop using HAVING altogether.

I already wrote the logic to do this, but my current problem which I ran into is an inherent limitation of SQL::Abstract, which is that the subselect (literal sql, identified as a ref/ref, etc) needs to live in the column section of the query, which in SQLA is a hashkey, and hashkeys in perl can't directly contain a ref... After digging back into this, I think that this SQLA problem was the actual reason I did this whole HAVING thing to begin with, because it allowed me to assign an alias identifier to represent the sub-query (in the SELECT), and then use that identifier as a column name in HAVING (since it couldn't be used in WHERE). This worked great in MySQL and SQLite, and had the additional benefit of allowing me to supply the sub-select only once... win/win (in fact, I was under the impression that this was one of the main points that HAVING even existed). But, now I know that is not true in Pg, so I am back to this original problem...

This general problem domain has been in my sub-conscious for a while now, and in, fact, is what I was trying to ask about during riba's talk last week, before this bug was even discovered:

https://www.youtube.com/watch?v=63_MGcsXEzs#t=2153

(I actually didn't really know what I was asking, but now I realize was about this and I just didn't know it, which is why my question sort of didn't fully make sense at the time)

So, I haven't yet figured out what to do to solve this... This code is a number of years old, and it is planned to be refactored along with the rest of this "DbicLink" layer... But I was hoping to be able to wait to do that, and find a quick solution to this...

Regarding scrottie's suggestion, it would involve a major structural change as well (i.e. mucking with other parts of the query that were generated far away and dependent on many other moving parts). Also, I'm guessing that these other techniques such as WITH, etc, will have the same cross-compatibly issues that I'm experiencing now with HAVING... It works fine on MySQL/SQLT, but not Pg... writing something that works on Pg but not MySQL is just as bad... Of course, I could start introspecting the backend db and create different implementations for different RDBMS'es, but that will open a whole other can of worms, code to be supported/maintained, test coverages, etc etc etc. I'm sure we will be doing this eventually, but I want to put it off as long as I can...

So, my current thought is that the best route is to go ahead and do the sub-select in the condition. I just need to figure out how to get around this SQLA structural limitation...

Note: there has also been some convos about this SQLA thing with frew/mst in IRC in #dbix-class …

Per ribasushi: ... <ribasushi> change { ' ' => $r } to { ' ' => $r } (note the space) <ribasushi> it's fucking horrible and retarded but you already knew that

See in-code comments

Grid Column Summaries are now available by default for all types of columns, including virtual and multi-rel columns, as well as single rel columns with virtual display columns. This has been structurally possible for a long time, but disabled by rule since before other significant refactoring, such as unfactor of special HAVING logic (see #51) and other changes. Since that time, this has been much easier to do. Besides simply removing the rule which disallowed column summaries ( 'no_summary' column property) some special value handling needed to be added to handle the ref structure of virtual columns which has been done, as well as cleanup and simplification of related code and now dead codepaths. Also did some minor changes to allow renderers for multi-rel columns to behave better when rendered as a named summary func, and added new column profile 'multirel' which is used internally set to multi-rel column opts for the summary funcs (as number) as well as the other multi-rel column opts in a cleaner manner. The only type of column for which column summaries are still disabled is m2m, which are a special case and probably make no sense to have a column summary at all, however, this may still revisit in the future...

vanstyn closed this as completed in 76d6b4a Jul 14, 2014

vanstyn added a commit that referenced this issue Oct 20, 2014

Fixed _binary_op_fuser for SQLA 1.80 - #51

870379c

Per ribasushi: ... <ribasushi> change { ' ' => $r } to { ' ' => $r } (note the space) <ribasushi> it's fucking horrible and retarded but you already knew that

vanstyn added a commit that referenced this issue Oct 30, 2014

multifilter_translate_cond use _binary_op_fuser - #51, fixes #69

d793844

See in-code comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-filter produces incorrect HAVING clause on Pg #51

Multi-filter produces incorrect HAVING clause on Pg #51

abaumhauer commented Jun 30, 2014

vanstyn commented Jul 4, 2014

scottwalters commented Jul 4, 2014

ribasushi commented Jul 5, 2014

vanstyn commented Jul 5, 2014

Multi-filter produces incorrect HAVING clause on Pg #51

Multi-filter produces incorrect HAVING clause on Pg #51

Comments

abaumhauer commented Jun 30, 2014

vanstyn commented Jul 4, 2014

scottwalters commented Jul 4, 2014

ribasushi commented Jul 5, 2014

vanstyn commented Jul 5, 2014