Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #30555 - Authorizer uses subselect for joined_on #7877

Merged
merged 1 commit into from
Aug 25, 2020

Conversation

ezr-ondrej
Copy link
Member

@ezr-ondrej ezr-ondrej commented Aug 3, 2020

if we use joined_on class, we are using the where clause on that class through the association.
This is very volatile and it doesnt play well with the Host STI.

See the test file for example failure

The original idea for the fix came from @sufanek1, kudos! 👍

@theforeman-bot
Copy link
Member

Issues: #30555

Copy link
Member

@ShimShtein ShimShtein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's better than failing, but can you share an explain plan for such query? I think it can get quite resource - intensive on the DB side. Especially when the sub-select is quite big.

# Get a subselect based on the scope search criteria
subselect = resource_class.joins(scope_components[:includes])
subselect = scope_components[:where].inject(subselect) { |scope_build, where| scope_build.where(where) }
scope = scope.where(assoc.foreign_key => subselect)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we modify the SELECT part of the subquery to return only the assoc.primary_key column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rails does that automatically if you pass scope to the id field.

@ezr-ondrej
Copy link
Member Author

ezr-ondrej commented Aug 10, 2020

The test builds following query

SELECT "fact_values".* FROM "fact_values" INNER JOIN "hosts" ON "hosts"."id" = "fact_values"."host_id" WHERE "fact_values"."host_id" IN (SELECT "hosts"."id" FROM "hosts" INNER JOIN "operatingsystems" ON "operatingsystems"."id" = "hosts"."operatingsystem_id" WHERE "hosts"."type" = 'Host::Managed' AND (("operatingsystems"."name" = 'Debian')))

Which have the following plan

EXPLAIN for: SELECT "fact_values".* FROM "fact_values" INNER JOIN "hosts" ON "hosts"."id" = "fact_values"."host_id" WHERE "fact_values"."host_id" IN (SELECT "hosts"."id" FROM "hosts" INNER JOIN "operatingsystems" ON "operatingsystems"."id" = "hosts"."operatingsystem_id" WHERE "hosts"."type" = 'Host::Managed' AND (("operatingsystems"."name" = 'Debian')))
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=9.55..23.95 rows=440 width=64)
   ->  Nested Loop  (cost=9.40..18.89 rows=1 width=8)
         ->  HashAggregate  (cost=9.27..9.28 rows=1 width=4)
               Group Key: hosts_1.id
               ->  Nested Loop  (cost=0.14..9.26 rows=1 width=4)
                     Join Filter: (hosts_1.operatingsystem_id = operatingsystems.id)
                     ->  Index Scan using index_hosts_on_type_and_organization_id_and_location_id on hosts hosts_1  (cost=0.14..8.15 rows=1 width=8)
                           Index Cond: ((type)::text = 'Host::Managed'::text)
                     ->  Seq Scan on operatingsystems  (cost=0.00..1.10 rows=1 width=4)
                           Filter: ((name)::text = 'Debian'::text)
         ->  Index Only Scan using hosts_pkey on hosts  (cost=0.14..8.15 rows=1 width=4)
               Index Cond: (id = hosts_1.id)
   ->  Index Scan using index_fact_values_on_host_id on fact_values  (cost=0.15..5.02 rows=4 width=64)
         Index Cond: (host_id = hosts.id)

I'm not sure if the subquery is efficient or not, although the data throughput should be higher as we don't pull the additional joined data. The rest should be more or less the same, but depends on the optimizer 🤷

@ezr-ondrej ezr-ondrej force-pushed the filters_assoc_mismatch branch 2 times, most recently from 81b7cee to 9eed565 Compare August 10, 2020 22:13
@ezr-ondrej
Copy link
Member Author

@ShimShtein any other issues? :)
Or would you like to go over it together?

if we use joined_on class, we are using the where clause on that class through the association.
This is very volatile and it doesnt play well with the Host STI.

See the test for example failure
scope = options[:joined_on].joins(assoc.name)
if scope_components[:where].present?
# Get a subselect based on the scope search criteria
subselect = resource_class.left_outer_joins(scope_components[:includes])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

includes here doesn't work as if used in subselect, it does select only id and Rails then considers the join useless, as includes primary intention was to preload. Thus I had to go with left_outer_joins here.

@ezr-ondrej
Copy link
Member Author

@ShimShtein #7913 should fix the issue you've hitted in your setup, could you try these together to see if it really helps?

@ShimShtein ShimShtein merged commit f4d9295 into theforeman:develop Aug 25, 2020
@ShimShtein
Copy link
Member

ShimShtein commented Aug 25, 2020

ACK, works as expected! And #7913 indeed helped with my setup.

Thanks @ezr-ondrej!

@ezr-ondrej ezr-ondrej deleted the filters_assoc_mismatch branch August 26, 2020 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants