Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative performance impact in the User Model #961

Closed
jasonbahl opened this issue Sep 20, 2019 · 4 comments · Fixed by #972
Closed

Negative performance impact in the User Model #961

jasonbahl opened this issue Sep 20, 2019 · 4 comments · Fixed by #972
Assignees
Labels
Component: Model Layer impact: high unblocks new usecases, substantial improvement to existing feature, fixes a major bug ObjectType: User Status: In Progress Status: 🚀 Actionable Issues that have been curated, have enough info to take action, and are ready to be worked on Type: Enhancement New feature or request

Comments

@jasonbahl
Copy link
Collaborator

When querying for Posts (or custom post types) or Users, we first check to see if the user is considered private or not.

To check this, we check to see if the user has published posts.

In WordPress, if a User has published a Post, they're considered a publicly accessible entity (not all fields of the User are considered public though).

The current approach for making this check is to use the core method count_user_posts

On small sites, this is a quick check. On large sites, with large datasets, this can be slow.

To reduce the performance impact, we should provide an alternative method that queries for published posts by the post_author more efficiently. We can do this by first limiting the query to 1 item. We don't care how many posts the author has published, because a single published post makes them non-private. We can also cache that so that we don't have to query each time we check. WordPress.com VIP has a cached version of count_user_posts we can draw inspiration from: https://github.com/svn2github/wordpress-vip-plugins/blob/9c0a32757bdd78a0e2babf72a593c2c6a83287df/vip-do-not-include-on-wpcom/wpcom-caching.php#L403-L425

We don't want to rely on the VIP function itself, as more often than not, a site will not be running the VIP code.

In addition to the more efficient, cached query, we should also provide a filter that allows for developers to bypass the check altogether and provide their own way of determining if a User should be considered public.

@jasonbahl jasonbahl added Type: Enhancement New feature or request impact: high unblocks new usecases, substantial improvement to existing feature, fixes a major bug labels Sep 20, 2019
jasonbahl added a commit to jasonbahl/wp-graphql that referenced this issue Sep 20, 2019
- This provides a more efficient method of checking if a User has published posts, which makes them non-private entities. This PR provides a filter for plugins/themes to bypass the check altogether, and also makes the check much more efficient by caching the check and making the actual query more efficient than the currently used `count_user_posts` query.
@jasonbahl jasonbahl self-assigned this Sep 20, 2019
@jasonbahl
Copy link
Collaborator Author

I opened a PR #962 to address this.

After discussing, the PR has been closed and I will give it another go.

The current PR attempts to make this more performant by making the query to count user posts more efficient, and caching the query.

@CodeProKid brings up the point that WPGraphQL shouldn't really in the business of making cache decisions, but should instead provide proper hooks and filters so that cache can be implemented.

I would agree.

So instead of caching the query, we should refactor a bit to use DataLoader to batch fetch the published post counts of users, which should significantly reduce the number of MySQL connections per request.

See similar DataLoader implementation here: https://github.com/wp-graphql/wp-graphql/blob/develop/src/Data/Loader/PostObjectLoader.php#L117-L123

Some pseudo SQL we can use for a new Loader might be:

SELECT 
    post_author, COUNT(*)
FROM
    wp_posts
WHERE ( ( post_type = 'post' AND ( post_status = 'publish' ) ) OR ( post_type = 'page' AND ( post_status = 'publish' ) ) )
AND post_author IN (17,1)
GROUP BY post_author;

Basically, what we'll need to do is accept a list of Author IDs, run a SQL Query to find out if they have at least 1 published post, then return an array in the same order as the Author IDs, but with values of the number of published posts they have (either 1 or 0).

If 1 (or more), that means they're a public entity, if 0 they are unpublished author and considered private.

@jasonbahl jasonbahl added this to the v0.7.0 milestone Jan 14, 2020
@CodeProKid CodeProKid added Status: 🚀 Actionable Issues that have been curated, have enough info to take action, and are ready to be worked on Component: Model Layer ObjectType: User Status: In Progress labels Jan 17, 2020
@jasonbahl jasonbahl removed this from the v0.7.0 milestone Jan 22, 2020
@jasonbahl jasonbahl added this to the 1.0 release milestone Apr 24, 2020
@jasonbahl jasonbahl removed this from the 1.0 release milestone Jun 24, 2020
@zacscott
Copy link

zacscott commented Jan 7, 2021

Hello there,

This is a significant problem on a project I am currently working on. This method is called dozens of times per query and it is running a COUNT(*) on hundreds of thousands of records.

Some hooks/filters for this would be greatly appreciated.

Here are some screenshot from our New Relic.

screenshot-1609993286
screenshot-1609993366

zacscott pushed a commit to zacscott/wp-graphql that referenced this issue Jan 15, 2021
chriszarate added a commit to Quartz/wp-graphql that referenced this issue Jan 17, 2021
There has been a long-standing performance issue with the is_private
method of the User model, because it attempts to count the number of
published posts for each user, a potentially very expensive operation.
And since the operation is uncached, this can very quickly overwhelm the
database.

More discussion of the issue:
wp-graphql#961

At Quartz, this has prevented us from upgrading WPGraphQL, and it
appears to have affected other users as well. A few different PRs exist
to address this issue:

wp-graphql#962
wp-graphql#972
wp-graphql#1677

This PR is my attempt to fix this issue the way that Jason and others
have described as ideal: using DataLoader.

Instead of creating a new loader (e.g., "UserVisibilityLoader"), I am
including this in UserLoader. The reason is that they seem tightly
coupled and making them separate would just introduce unnecessary
complexity. You'd need to remember to use UserVisibilityLoader
everywhere you used UserLoader or risk losing the performance benefit
it provides.

My fix simply plugs into the existing "loadKeys" implementation and
produces a single query that determines visibility for each user. I'd
love feedback on the efficiency of this query; I hope that this is
good enough to sidestep the need for caching. If its not possible to
generate a single efficient query, generating an extra query for each
user along the lines of `SELECT id FROM wp_posts WHERE post_author = X
LIMIT 1` seems like the next best approach.

Finally, we need a way to provide the visibility to the User model.
Luckily, WP_User is not a final class and provides a setter:

https://developer.wordpress.org/reference/classes/wp_user/__set/

This allows us to set an "is_private" property on the WP_User instance,
which is then available to the model via "$this->data". This seems to
be done widely, but we could implement a wrapper around WP_User if we
wanted to avoid mutating WP_User instances.

From there, it's a very simple matter of having the User model check
this property.
@zacscott
Copy link

Hello,

I have raised another pull request to hopefully help resolve this issue. This was based on the PR you originally opened @jasonbahl, simply adding the graphql_user_model_is_private filter with no caching / query changes etc. #1685

I see @chriszarate has also opened a pull request a short time ago. I think there is a place for both of our solutions.

Being able to completely bypass the is_private() is ideal for our situation as we'll either have all users available or have it based on something much simpler, e.g. the user role.

@zacscott
Copy link

Sorry to bump this thread but can we get this one prioritised. This issue is generating huge load for us.

jasonbahl added a commit that referenced this issue Jul 23, 2021
/negative-performance-for-user-model

# Conflicts:
#	phpstan.neon.dist
jasonbahl added a commit that referenced this issue Jul 26, 2021
…e-for-user-model

Fix negative performance with Post / User model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Model Layer impact: high unblocks new usecases, substantial improvement to existing feature, fixes a major bug ObjectType: User Status: In Progress Status: 🚀 Actionable Issues that have been curated, have enough info to take action, and are ready to be worked on Type: Enhancement New feature or request
Projects
None yet
3 participants