-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative performance impact in the User Model #961
Comments
- This provides a more efficient method of checking if a User has published posts, which makes them non-private entities. This PR provides a filter for plugins/themes to bypass the check altogether, and also makes the check much more efficient by caching the check and making the actual query more efficient than the currently used `count_user_posts` query.
I opened a PR #962 to address this. After discussing, the PR has been closed and I will give it another go. The current PR attempts to make this more performant by making the query to count user posts more efficient, and caching the query. @CodeProKid brings up the point that WPGraphQL shouldn't really in the business of making cache decisions, but should instead provide proper hooks and filters so that cache can be implemented. I would agree. So instead of caching the query, we should refactor a bit to use DataLoader to batch fetch the published post counts of users, which should significantly reduce the number of MySQL connections per request. See similar DataLoader implementation here: https://github.com/wp-graphql/wp-graphql/blob/develop/src/Data/Loader/PostObjectLoader.php#L117-L123 Some pseudo SQL we can use for a new Loader might be:
Basically, what we'll need to do is accept a list of Author IDs, run a SQL Query to find out if they have at least 1 published post, then return an array in the same order as the Author IDs, but with values of the number of published posts they have (either 1 or 0). If 1 (or more), that means they're a public entity, if 0 they are unpublished author and considered private. |
… modify this functionality (relates to issue wp-graphql#961)
There has been a long-standing performance issue with the is_private method of the User model, because it attempts to count the number of published posts for each user, a potentially very expensive operation. And since the operation is uncached, this can very quickly overwhelm the database. More discussion of the issue: wp-graphql#961 At Quartz, this has prevented us from upgrading WPGraphQL, and it appears to have affected other users as well. A few different PRs exist to address this issue: wp-graphql#962 wp-graphql#972 wp-graphql#1677 This PR is my attempt to fix this issue the way that Jason and others have described as ideal: using DataLoader. Instead of creating a new loader (e.g., "UserVisibilityLoader"), I am including this in UserLoader. The reason is that they seem tightly coupled and making them separate would just introduce unnecessary complexity. You'd need to remember to use UserVisibilityLoader everywhere you used UserLoader or risk losing the performance benefit it provides. My fix simply plugs into the existing "loadKeys" implementation and produces a single query that determines visibility for each user. I'd love feedback on the efficiency of this query; I hope that this is good enough to sidestep the need for caching. If its not possible to generate a single efficient query, generating an extra query for each user along the lines of `SELECT id FROM wp_posts WHERE post_author = X LIMIT 1` seems like the next best approach. Finally, we need a way to provide the visibility to the User model. Luckily, WP_User is not a final class and provides a setter: https://developer.wordpress.org/reference/classes/wp_user/__set/ This allows us to set an "is_private" property on the WP_User instance, which is then available to the model via "$this->data". This seems to be done widely, but we could implement a wrapper around WP_User if we wanted to avoid mutating WP_User instances. From there, it's a very simple matter of having the User model check this property.
Hello, I have raised another pull request to hopefully help resolve this issue. This was based on the PR you originally opened @jasonbahl, simply adding the graphql_user_model_is_private filter with no caching / query changes etc. #1685 I see @chriszarate has also opened a pull request a short time ago. I think there is a place for both of our solutions. Being able to completely bypass the is_private() is ideal for our situation as we'll either have all users available or have it based on something much simpler, e.g. the user role. |
Sorry to bump this thread but can we get this one prioritised. This issue is generating huge load for us. |
…e-for-user-model Fix negative performance with Post / User model
When querying for Posts (or custom post types) or Users, we first check to see if the user is considered private or not.
To check this, we check to see if the user has published posts.
In WordPress, if a User has published a Post, they're considered a publicly accessible entity (not all fields of the User are considered public though).
The current approach for making this check is to use the core method
count_user_posts
On small sites, this is a quick check. On large sites, with large datasets, this can be slow.
To reduce the performance impact, we should provide an alternative method that queries for published posts by the post_author more efficiently. We can do this by first limiting the query to 1 item. We don't care how many posts the author has published, because a single published post makes them non-private. We can also cache that so that we don't have to query each time we check. WordPress.com VIP has a cached version of
count_user_posts
we can draw inspiration from: https://github.com/svn2github/wordpress-vip-plugins/blob/9c0a32757bdd78a0e2babf72a593c2c6a83287df/vip-do-not-include-on-wpcom/wpcom-caching.php#L403-L425We don't want to rely on the VIP function itself, as more often than not, a site will not be running the VIP code.
In addition to the more efficient, cached query, we should also provide a filter that allows for developers to bypass the check altogether and provide their own way of determining if a User should be considered public.
The text was updated successfully, but these errors were encountered: