New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #30053 - Audit page render slow when organization is set #7728
Fixes #30053 - Audit page render slow when organization is set #7728
Conversation
|
Issues: #30053 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code wise looks good to me.
I haven't tested it but added small inline comment.
|
any progress on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @hao-yu I think this is really great change, I'd like to be able to do taxonomies through arel.
That being said, I think we are setting up an API for that not just for Audits, I've left comments what I think could be done bit differently, but I might be wrong and I am open to have a discussion about this.
Wouldn't you mind to have a brief meeting with me and @kgaikwad about this to get us understand it better and thus get it in faster?
app/models/concerns/taxonomix.rb
Outdated
| clauses << loc_join_arel[:taxonomy_id].in(user_locations) | ||
| end | ||
|
|
||
| nodes = clauses.any? ? clauses.reduce(:and) : nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's bit weird naming for it, could we come up with something more descriptive?
| nodes = clauses.any? ? clauses.reduce(:and) : nil | |
| statement = clauses.any? ? clauses.reduce(:and) : nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. naming is very hard sometimes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
app/models/concerns/taxonomix.rb
Outdated
| end | ||
|
|
||
| def join_taxonomies(taxonomies, join_name) | ||
| joins("LEFT JOIN #{TAXONOMY_JOIN_TABLE} AS #{join_name} "\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do this through arel as well? I'd rather not mix arel and pure sql if possible and this joins should not be as hard, right? We could pass in the arel table (org_join_arel or loc_join_arel)
I think if we do it through arel, the Rails have it easier to figure wheter the join is already used in a scope or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. can do. shouldn't be an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| end | ||
|
|
||
| def taxed_and_untaxed | ||
| in_taxonomy_scope do |nodes| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very unclear method it sounds it's doing the same as with_taxonomy_scope, the purpose of the block is not very clear on the first sight (without actually reading the in_taxonomy_scope body).
If we could get rid of the unless branches from in_taxonomy_scope, the method would not be needed and we could just do:
statements = # code in the block body
taxonomy_join_scope.where(statements)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the purpose of in_taxonomy_scope is to replace the with_taxonomy_scope for more efficient query. I think it might be good to replace with_taxonomy_scope in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think block is still needed so that the arel query can be customized better and more flexible, such as the grouping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezr-ondrej as per discussed, I removed the "in_taxonomy_scope" and will make it reusable for other places in a separate commit.
aae7760
to
76a23d5
Compare
|
@ezr-ondrej please hold the merge. I would like to double check the code. thanks |
76a23d5
to
22308e7
Compare
|
Done! |
app/models/concerns/taxonomix.rb
Outdated
| @@ -132,6 +132,44 @@ def scope_by_taxable_ids(scope) | |||
| scope.where("#{table_name}.id IN (#{cached_ids.join(',')})") | |||
| end | |||
| end | |||
|
|
|||
| def join_organizations | |||
| join_taxonomies(Organization.all, org_join_arel) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| join_taxonomies(Organization.all, org_join_arel) | |
| join_taxonomies(Organization.unscoped, org_join_arel) |
I think, unscoped is required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside join_taxonomies method it requires only taxonomy{location/organization}_ids. So, would it be good to pass directly ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or probably just taxonomy class ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing "unscoped" is optional or trivial since the only default scope is order(:title). Referring to https://github.com/theforeman/foreman/blob/develop/app/models/taxonomies/organization.rb#L22 unscoped is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will prefer passing the scope to the method as i don't have to call pluck on every call manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm for keeping the scope, as we could decide in the future not to use ids, but subselect, so this is more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgaikwad @ezr-ondrej Done! make them use "unscoped" as suggested.
app/models/concerns/taxonomix.rb
Outdated
| end | ||
|
|
||
| def join_locations | ||
| join_taxonomies(Location.all, loc_join_arel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
22308e7
to
5ef5ce2
Compare
|
can we merge this if nothing is pending? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise LGTM 👍
@kgaikwad are you ok with the code? Shall I test it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgaikwad are you ok with the code? Shall I test it?
Sorry for the delay. Changes works well without any failure.
Tried page loading testing using Rack Mini Profiler but not sure why it isn't showing me sql queries logs. Found little improvement.
|
@kgaikwad: Thanks for the test. Just curious about the little performance improvement. Can you tell me what was your test result before and after the patch? Just test in a reproducer with 135K audit entries: Before: After: |
@hao-yu, thank you for sharing results.
I also noticed that new count query takes more time than previous.
[D|sql|6c3ea52b] (1.4ms) SELECT COUNT(*) FROM "audits" WHERE ((("audits"."auditable_type" IN ($1, $2, $3, $4, $5, $6) OR "audits"."auditable_type" IN ($7, $8, $9, $10, $11, $12, $13, $14, $15, $16) AND (audits.id IN (184,116,87,71,68,51,80,70,52,190,162,84,170,176,169,92,237,69,180,97,238,236,108,156,59,197,65,98,173,149,311,200,189,239,235,161,117,88,82,188,119,240,153,147,202,196,79,210,187,85,72,95,81,244,160,232,77,302,172,198,212,242,204,249,165,151,91,209,74,179,225,138,181,245,205,96,83,67,201,63,90,105,216,174,213,107,233,86,168,163,223,93,89,241,221,167,14,231,66,158,155,118,195,207,199,75,194,243,99,152,206,157,53,230,301,183,136,150,139,193,12,164,248,137,208,78,159,94,218,186,154,234,211,177,76,106,185,64,178,148,203,58))) OR 1=0 AND "audits"."id" IN (SELECT "taxable_taxonomies"."taxable_id" FROM "taxable_taxonomies" WHERE "taxable_taxonomies"."taxonomy_id" = $17 AND "taxable_taxonomies"."taxable_type" = $18)) OR "audits"."auditable_type" = $19 AND "audits"."id" IN (SELECT "taxable_taxonomies"."taxable_id" FROM "taxable_taxonomies" WHERE "taxable_taxonomies"."taxonomy_id" = $20 AND "taxable_taxonomies"."taxable_type" = $21)) [["auditable_type", "Image"], ["auditable_type", "Operatingsystem"], ["auditable_type", "ComputeAttribute"], ["auditable_type", "Setting"], ["auditable_type", "KeyPair"], ["auditable_type", "Nic::Managed"], ["auditable_type", "Host::Base"], ["auditable_type", "ProvisioningTemplate"], ["auditable_type", "SmartProxy"], ["auditable_type", "Hostgroup"], ["auditable_type", "Filter"], ["auditable_type", "ComputeResource"], ["auditable_type", "HttpProxy"], ["auditable_type", "User"], ["auditable_type", "Parameter"], ["auditable_type", "Domain"], ["taxonomy_id", 1], ["taxable_type", "Audited::Audit"], ["auditable_type", "Organization"], ["taxonomy_id", 2], ["taxable_type", "Audited::Audit"]]
[D|sql|86cf2281] (1.5ms) SELECT COUNT(DISTINCT "audits"."id") FROM "audits" LEFT JOIN taxable_taxonomies AS tto ON audits.id = tto.taxable_id AND tto.taxable_type = 'Audited::Audit' AND tto.taxonomy_id IN (1) LEFT JOIN taxable_taxonomies AS ttl ON audits.id = ttl.taxable_id AND ttl.taxable_type = 'Audited::Audit' AND ttl.taxonomy_id IN (2) WHERE ((((("tto"."taxonomy_id" IN (1) AND "ttl"."taxonomy_id" IN (2) AND "audits"."auditable_type" IN ('Host::Base', 'ProvisioningTemplate', 'SmartProxy', 'Hostgroup', 'Filter', 'ComputeResource', 'HttpProxy', 'User', 'Parameter', 'Domain')) OR "audits"."auditable_type" IN ('Image', 'Operatingsystem', 'ComputeAttribute', 'Setting', 'KeyPair', 'Nic::Managed')) OR (1=0 AND "tto"."taxonomy_id" IN (1))) OR ("audits"."auditable_type" IN ('Organization') AND "ttl"."taxonomy_id" IN (2)))) That's why, I requested @ezr-ondrej to test and confirm. |
|
@kgaikwad @ezr-ondrej Yeah. The new query is slightly slower than the old query when the records count is small (a few hundreds to a few thousands) but it is very insignificant, like < 50ms. so I think it is definitely worth. Test for 2.4k audit entries Before: After: |
@hao-yu - yes, definitely it is! 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen some weird times myself, so I've took time to test this properly.
I'm sorry it took a while I've got to it.
Measurements
I've used a following script, please do point out any issues you'd find in the script itself
https://gist.github.com/ezr-ondrej/81c8d2a12fda3df1b23f19d76eae9374
Results
My findings are, that the new query is MUCH slower, so we save time at preparation of the query, but the query itself is very slow in comparison to the old one.
Results: https://pastebin.com/f7aBpdkk
Notice the real time (real - total =(approx) db time)
|
@ezr-ondrej Thanks for the test script. I am not exactly sure why the patched version would perform so badly in the test but there is an issue to the script. It is so memory hunger. I am unable to create large records for testing. It will eat all my 8GB + 1.5G swap by just creating 5 - 6K of records. So I wrote a new rake task to run the benchmark test outside of the test framework. https://gist.github.com/hao-yu/68a4852e5541b6fb3f8d6ee5837e312b The tests indicate that the patched version performs faster as the record count grows, especially in pagination. See the results in: |
|
I don't see arel as something that would be harder to maintain, but quite the opposite. I think we should use it more especially for more complex SQL. It plays with Rails much better than plain SQL strings. To the results, depends witch are we taking about, I've had to make few adjustments to the testing script, but overall it works as expected. |
Before submitting patch, I was doing this with regular active record methods but I think it is very untidy, hard to read and also hard to maintain and reuse due to the complexity of the statement and decided to use arel instead.
If you are referring to the result of test #6 and test #7. It is because I modified the script to set the user.location to nil so that I can save some time to seed thousands of data for the tests. Also worth to note that, count of both Develop and Patched versions are matching in all tests.
I ran same query a few times during the test, 2nd and subsequent times are just a bit quicker but not significant. Feel free to test.
I did monitor the memory usage while the query is running and see no significant memory consumption. But happy to prove that with the gem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that arel is sometimes better than writing sql directly, but very few developers are familiar with it, while all developers are familiar with both active record and sql. I'm still worried that adding ~70 lines of arel code would make this area much less accessible to most developers and would rather minimize it to only where it significantly improves readability.
I'm also a bit concerned about the performance of a double outer join on the tax_tax table in a real db- this table can easily get to the millions of records, and we might be hitting a wall with the db once it gets to doing the joins on disk.
Also added some comments inline.
|
|
||
| def taxed_and_untaxed | ||
| clauses = [] | ||
| unless !Organization.current && User.current.admin? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think flipping around the conditional will make it easier to understand (also for location below):
| unless !Organization.current && User.current.admin? | |
| if Organization.current || !User.current.admin? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
app/models/concerns/taxonomix.rb
Outdated
| def user_organizations | ||
| Array.wrap(Organization.current || Organization.my_organizations.reorder(nil)) | ||
| end | ||
|
|
||
| def user_locations | ||
| Array.wrap(Location.current || Location.my_locations.reorder(nil)) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are very similar to Taxonomy.expand. I would suggest considering using the existing method instead of adding a new one. Also, what about the child taxonomies in the case of the current?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the child taxonomies issue for the current context but I haven't use the "Taxonomy.expand" as it doesn't quite make sense to me. It will cause duplicate code run for non-admin user with any context. E.g. run Taxonomy.expand -> then run get_taxonomy_ids. This will call "Organization.my_organizations" to get all children and then call the "get_taxonomy_ids" to get all children again.
app/models/concerns/taxonomix.rb
Outdated
|
|
||
| def join_taxonomies(taxonomies, tax_arel) | ||
| class_arel = base_class.arel_table | ||
| statement = class_arel.join(tax_arel, Arel::Nodes::OuterJoin).on( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to the audits case, where you need the outer join so you can also see the partially taxed and untaxed resources. in other cases the outer join doesn't make sense.
Also, not all models actually include taxonomix - specifically those that are not fully taxed.
Should these methods all live inside the audits extension and not included in all models using taxonomix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the outer joins make sense for all models that include taxonomix and even partially taxed but it is only efficient if the models records can grow large like Audit. It is because outer joins is a little bit slower for smaller tables comparing to the current way. For example, ForemanOpenscap::ArfReport might be the candidate that can apply the outer join..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Tomer is comparing OUTER vs INNER joins. And I'd agree that if we use INNER join, we can use the join like the only mean to filter the records, we wouldn't need further conditions on the joined table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, outer join is more flexible. BUT then it doesn't have much sense to use the taxonomy_ids in the join itself, because we need another condition using the ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I am wrong. I had thought about INNER JOIN earlier but I didn't think it will work because I need to include the audit rows even it is only partially taxable, such as taxable only by organization.
Yes, I think taxanomy_ids in the OUTER JOIN is optional. It is just there for me to feel safer. I am happy to remove it if everyone can confirm it is good to remove.
I re-trigger the test script. It is now printing out the EXPLAIN for all tests Please check the links below. Based on the tests, I notice the following points:
If you are still concerning about this patch will give negative impact in the case of million records, I think we will have to ask for real data from the user who is hitting this issue. Benchmark for Develop: https://pastebin.com/cP02RyPh
According to the below test results, there is a huge memory improvement, especially in count, and pagination. The number of created objects are also significantly less than the Develop branch. Please read the link below for details: |
8667aba
to
894509a
Compare
|
Thank you for your contribution, @hao-yu! This PR has reached an impasse with no new activity for 1 months, closing for now. |
894509a
to
312d21e
Compare
| taxonomy = taxonomy_class.current | ||
| if any_context?(taxonomy) | ||
| taxonomy_relation = taxonomy_class.to_s.underscore.pluralize | ||
| taxonomy_class.send("my_#{taxonomy_relation}").reorder(nil).pluck(:id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use User.taxonomies_and_child_ids(taxonomy_class.to_s.pluralize) to get the ids directly (and in most cases from the cache) rather than trigger another query here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because I would like to return all taxonomies if user is an admin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then how about
| taxonomy_class.send("my_#{taxonomy_relation}").reorder(nil).pluck(:id) | |
| User.current.admin? ? taxonomy_class.unscoped.all : User.taxonomies_and_child_ids(taxonomy_class.to_s.pluralize) |
| taxonomy_relation = taxonomy_class.to_s.underscore.pluralize | ||
| taxonomy_class.send("my_#{taxonomy_relation}").reorder(nil).pluck(:id) | ||
| else | ||
| get_taxonomy_ids(taxonomy, :subtree_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also get all ancestors if the current taxonomy is a nested one, is that on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should get all descendants of the current taxonomy including itself. This should be the same as inner_select method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing but that isn't what get_taxonomy_ids does:
https://github.com/theforeman/foreman/blob/develop/app/models/concerns/taxonomix.rb#L60-L62
to get all descendants you should use
| get_taxonomy_ids(taxonomy, :subtree_ids) | |
| taxonomy.subtree_ids |
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code here is based on the below logic. The below logic is getting all the ancestors too so I am a quite confuse how this taxonomy hierarchy should work. If this code is wrong then maybe the below logic is wrong too.
https://github.com/theforeman/foreman/blob/develop/app/models/concerns/taxonomix.rb#L87
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not causing a regression, so I'm all for merging it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. How is this going? Not why this is taking so long.
app/models/concerns/taxonomix.rb
Outdated
|
|
||
| def join_taxonomies(taxonomies, tax_arel) | ||
| class_arel = base_class.arel_table | ||
| statement = class_arel.join(tax_arel, Arel::Nodes::OuterJoin).on( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I am wrong. I had thought about INNER JOIN earlier but I didn't think it will work because I need to include the audit rows even it is only partially taxable, such as taxable only by organization.
Yes, I think taxanomy_ids in the OUTER JOIN is optional. It is just there for me to feel safer. I am happy to remove it if everyone can confirm it is good to remove.
| taxonomy_relation = taxonomy_class.to_s.underscore.pluralize | ||
| taxonomy_class.send("my_#{taxonomy_relation}").reorder(nil).pluck(:id) | ||
| else | ||
| get_taxonomy_ids(taxonomy, :subtree_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should get all descendants of the current taxonomy including itself. This should be the same as inner_select method.
| taxonomy = taxonomy_class.current | ||
| if any_context?(taxonomy) | ||
| taxonomy_relation = taxonomy_class.to_s.underscore.pluralize | ||
| taxonomy_class.send("my_#{taxonomy_relation}").reorder(nil).pluck(:id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because I would like to return all taxonomies if user is an admin.
|
Thank you for your contribution, @hao-yu! This PR has reached an impasse with no new activity for 1 months, closing for now. |
312d21e
to
c809be8
Compare
|
I've rebased on top of #8370 to run the tests :) |
|
[test foreman] the integration tests were terribly slow, lets try again if that was an anomaly |
|
[test foreman] ok it is not terribly slow, but still seems to be slower than the others. |
|
The test with this patch seems faster? test with this patch spent 11secs and test without this patch spent 18 secs? |
It is not consistent though (take a look into history of tests) so it is hard to base any evidence on that, but it is working and I'd like to start using arel in taxonomies, because I belive it'll make them more readable. We have doubled the test cases for this so I believe this works as it did and it is faster. I'm merging this. |
Thanks for the review, I believe all blocking comments were resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hao-yu ! Let see it in production :)
The audit page taking long time to load when there are many audit logs and organization is set. This page fixes the issue