-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MySQL loadTables performance #6820
Fix MySQL loadTables performance #6820
Conversation
Add where clause to improve performance
added github packages
Fix indices query while maintaining performance
Is it possible to split these two changes apart? The performance vs the FK query check? It's nicer to have that from a changelog perspective & so if we have to roll things back we can isolate what we're rolling back. |
@imnotjames Sure, I split only the foreign key fix into another PR |
…nancy-performance
…nancy-performance
…nancy-performance
Is the subquery needed if we're already doing this in the where clause in the main query? Does the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the test for this a duplicate of the one for 6168?
The subquery is needed for the indices because it's a left join. If it were an inner join, like the foreign keys, then we could use the main where clause, but since it's a left join we also need in the results the rows where REFERENTIAL_CONSTRAINTS columns are null. I also tried adding it in the main where clause with an OR IS NULL, but that doesn't get the performance improvement. The explain shows the problem. For example for the foreign key query it has 2 rows, one for kcu and one for rc. |
The test isn't exactly a duplicate, because the 6168 PR only touched the foreign keys, so I only tested the foreign keys there. Here I also added index checks to see they're unaffected by the change. |
@imnotjames @pleerock Hi, I noticed you merged this refactor yesterday. See explain only with the Explain with both However I did just notice that I used KEY_COLUMN_USAGE's columns TABLE_SCHEMA and CONSTRAINT_SCHEMA interchangeably, which is a possible bug. If TABLE_SCHEMA !== CONSTRAINT_SCHEMA then it's wrong to add the |
I'm finding that having an
Causes:
However,
Leaves me with
Hypothesis: It might be faster to either run multiple queries - one per database & schema - or to construct a nightmarish frankenstein-query that unions a bunch of these together with single lookups. |
This change will fix a performance issue in MySQL loadTables, which is especially bad in multi-tenanted environments.
loadTables performs queries to INFORMATION_SCHEMA, which is not optimized.
MySQL recommends adding a where clause with the schema and table for better performance - https://dev.mysql.com/doc/refman/5.7/en/information-schema-optimization.html.
There are 2 queries with a JOIN that have a where clause on the schema and table, but the where clause is only for one of the tables in the join.
For large databases with many schemas, this can be a crucial improvement.
In our database, for example, it improved each query from about 7.5 seconds to around 200 milliseconds.