-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#961 - Negative performance impact in the User model #962
#961 - Negative performance impact in the User model #962
Conversation
- This provides a more efficient method of checking if a User has published posts, which makes them non-private entities. This PR provides a filter for plugins/themes to bypass the check altogether, and also makes the check much more efficient by caching the check and making the actual query more efficient than the currently used `count_user_posts` query.
Codecov Report
@@ Coverage Diff @@
## develop #962 +/- ##
==========================================
+ Coverage 61.85% 61.9% +0.04%
==========================================
Files 135 135
Lines 8196 8213 +17
==========================================
+ Hits 5070 5084 +14
- Misses 3126 3129 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got some questions here @jasonbahl
* to see if the user has published posts. | ||
* @param User $this Instance of the User Model | ||
*/ | ||
$is_private = apply_filters( 'graphql_user_model_is_private', null, $this ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of already handled here -> https://github.com/wp-graphql/wp-graphql/blob/develop/src/Model/Model.php#L222 though the query will still run, so maybe we just need to rework that lower level filter, but I think we should keep all of the Model Layer hooks & filters at the abstract level so they are consistent across all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeProKid If we were to use the existing graphql_data_is_private
filter, it still executes this is_private
method, which will execute the query to check for published posts, even if a filter were applied to override the results.
Perhaps we could do what I'm doing here in the Model.php file though, to centralize it better.
Basically, I want to provide an "escape hatch" for folks to completely override the automated checks we're doing on their behalf.
For example, if you just have a hard rule that ALL USERS should be considered private, always, then you could avoid the lookup for published posts altogether and just return true with no overhead.
If you were to do that with the current filter, and just return__true
it still executes the query.
I didn't think it would, but upon testing and actually monitoring, it indeed does. 🤷♂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's what I'm proposing. Move the existing filter in Model.php to before we call $this->is_private();
and only call $this->is_private();
if the filter is empty, otherwise use the results of the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 we would use this filter and it would solve the entire perf problem for us
*/ | ||
$post_types = \WPGraphQL::get_allowed_post_types(); | ||
unset( $post_types['revision'] ); | ||
unset( $post_types['attachment'] ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand revisions, but why unset attachments? I think it might be better to do an array intersect on all public post types maybe?
This gets pretty complicated actually, as we technically shouldn't expose any authors that don't have published posts exposed in GraphQL. So if someone goes and locks down the Pages
type, and a user only has posts in that post type, we technically shouldn't expose them. In a headless context, there would be no other way to enumerate that user. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attachments don't get the publish
status, just inherit.
So, it's a waste of energy to check for publish
attachments, as there won't be any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see. Would be good to add an inline comment about this for context, since this seems super arbitrary.
* | ||
* @param int @userid ID of the User | ||
* @param mixed|string|array $post_type The post type(s) to check for published posts in | ||
* @param boolean $public_only Whether to count only publicly published posts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing is messed up here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I used the composer run fix-cs
command here, so it should be formatted according to the WP code standards...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh! Spacing is because of @userid
should have been $userid
* | ||
* We only care if there's at least 1 published post, not the total. | ||
*/ | ||
if ( false === ( $count = wp_cache_get( $cache_key, $cache_group ) ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This falls apart if someone un-publishes the post. We would have to have some code hooked into transition post status to bust this cache.
Also, where does this cache get set? I only see the get call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya. . .that's what I was thinking too. 😬
Where's DFM Transients when you need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, where does this cache get set? I only see the get call.
uhhh. . .looks like I caught a case of the "moving too fasts" 🤢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will also get stuck in a bad state if an author has 0 published posts, and then publishes a post, as it will still be cached as 0
. So you'll need to bust the cache in that instance as well.
We definitely want to cache 0
as a legitimate result since querying for published posts for a user that has none will do a full table scan to result in the 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would consider calling to wpcom_vip_count_user_posts
behind a function_exists
check. that function basically implements what you've done here but i believe they handle the cache busting:
i don't have a strong opinion on needing to bust the cache, i'd probably lean towards not busting it and just documenting the delay.
can i ask what the threat model is here? why exactly do users without published posts have an expectation of privacy? the whole reason that this method is unperformant is that it always runs, even when the user connected to a post (in which case, they obviously have pubbed a post).
is there a way to make this run conditional depending on the $source
object?
Closing this, per discussion here: #961 (comment) |
@chriszarate WordPress treats users as private entities by default until they publish posts. If you install WordPress and create a new user, that user is not discoverable (by a non authenticated user) in any way until they have published a post. We're trying to respect existing access control patterns that WordPress has set. |
There has been a long-standing performance issue with the is_private method of the User model, because it attempts to count the number of published posts for each user, a potentially very expensive operation. And since the operation is uncached, this can very quickly overwhelm the database. More discussion of the issue: wp-graphql#961 At Quartz, this has prevented us from upgrading WPGraphQL, and it appears to have affected other users as well. A few different PRs exist to address this issue: wp-graphql#962 wp-graphql#972 wp-graphql#1677 This PR is my attempt to fix this issue the way that Jason and others have described as ideal: using DataLoader. Instead of creating a new loader (e.g., "UserVisibilityLoader"), I am including this in UserLoader. The reason is that they seem tightly coupled and making them separate would just introduce unnecessary complexity. You'd need to remember to use UserVisibilityLoader everywhere you used UserLoader or risk losing the performance benefit it provides. My fix simply plugs into the existing "loadKeys" implementation and produces a single query that determines visibility for each user. I'd love feedback on the efficiency of this query; I hope that this is good enough to sidestep the need for caching. If its not possible to generate a single efficient query, generating an extra query for each user along the lines of `SELECT id FROM wp_posts WHERE post_author = X LIMIT 1` seems like the next best approach. Finally, we need a way to provide the visibility to the User model. Luckily, WP_User is not a final class and provides a setter: https://developer.wordpress.org/reference/classes/wp_user/__set/ This allows us to set an "is_private" property on the WP_User instance, which is then available to the model via "$this->data". This seems to be done widely, but we could implement a wrapper around WP_User if we wanted to avoid mutating WP_User instances. From there, it's a very simple matter of having the User model check this property.
@chriszarate reported a performance issue on sites with larger data sets within the User model.
This provides a more efficient method of checking if a User has published posts, which makes them non-private entities. This PR provides a filter for plugins/themes to bypass the check altogether, and also makes the check much more efficient by caching the check and making the actual query more efficient than the currently used
count_user_posts
query.closes #961