-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Eliminate refresh_from_db on UserProfile objects #26533
base: main
Are you sure you want to change the base?
Conversation
I add a bunch of cute helper methods to make the test a bit more readable. And then I make sure to get clean objects, which precludes the need for our callback functions to refresh the user objects. And finally I make sure that our validation functions don't cause any round trips (assuming we have fetched objects using a standard Zulip helper, which example_user ensures.)
The get_user function is poorly named, but I don't want to sweep the entire codebase yet. It's also nice to have a test wrapper for little experiments like profiling tests or hunting down calls to refresh_from_db. It's possible that we would also just change the new wrapper to more directly call Django. The `get_user` function isn't used in a ton of real-world places, so we might want the test code to just bypass the cache.
The helper here was no longer a useful abstraction.
The `expected` flag was incredibly confusing, as you couldn't tell from the calling code what you were actually expecting to happen. I avoid the context manager idiom in order to force the callers to create simple helper functions, and I de-duplicate some code in some places. I also force the caller to explicitly soft-deactivate the user with one simple line of code, so that the person reading the test doesn't have to research the side effects of the helper. (And I make it very easy for new authors to follow the practice going forward.) This is also somewhat of a prep commit to avoid the obfuscated use of refresh_from_db.
We could actually eliminate all the calls here and still have the events test pass, but I am too timid to track down whether any subtle changes would happen under the surface and somehow escape line coverage. Fortunately, these calls are quite cheap, as I indicate in the comments. In general we want to phase out calls to refresh_from_db when it comes to users. It is better to just re-fetch them using Zulip APIs, especially if there are things like query counts being checked.
We want to avoid refresh_from_db.
The comments about circumventing the cache were misleading, since we DON'T want to circumvent the memcached cache. I think they referred to the Django cache, but even that was kind of strange. I think anybody who touches these tests will figure out why we re-fetched Hamlet.
We want to make sure that we fetch users similar to how real code will fetch users. By using refresh_user instead of Django's refresh_from_db, we ensure that caches are properly invalidated after the prior save operation. The actual motivation for this change is that we also want to use the django-seal project to hunt down places where we make unnecessary round trips to the database, and calling refresh_from_db leads to some false alarms.
We no longer opaquely refresh the user objects in these helpers. The callers don't break here, because none of them do any of their own assertions on the original user object (or they immediately re-assign the user object themselves). We have plenty of precedent in our main soft-deactivation tests that the caller should refresh the user objects from the database after any non-trivial operation.
The adjustment to query count reflects the fact that we have a truly refreshed object from Zulip's layer for getting new UserProfile objects, rather than a Django object who has cleared out some of its cache entries. This saves us a round trip for the Realm object.
We generally want to avoid refresh_from_db.
After changing the realm this way, there is no reason to re-load it. If anything, you would expect things like user objects to get stale, but re-loading the realm wouldn't help them anyway.
Right now this only works for most test users, but that covers probably most likely cases. We are trying to phase out refresh_from_db to allow us to use django-seal in the future.
user.refresh_from_db() | ||
|
||
# refresh user | ||
user = self.example_user("hamlet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One theory is that we could have this be calling self.example_user("hamlet", skip_cache=True)
or some similar abstraction for cases where we previously did refresh_from_db
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, again, we actually do want the memcached piece here. We want to fetch the data in the same way that the server would.
@@ -2768,12 +2768,10 @@ def test_no_topic_message(self) -> None: | |||
addressee = Addressee.for_stream(stream, topic_name) | |||
|
|||
do_set_realm_property(realm, "mandatory_topics", True, acting_user=None) | |||
realm.refresh_from_db() | |||
|
|||
with self.assertRaisesRegex(JsonableError, "Topics are required in this organization"): | |||
check_message(sender, client, addressee, message_content, realm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here, again, I do want us to do a refresh; calling check_message(sender, client, addressee, message_content, realm)
means that the test may now be relying on the fact that do_set_realm_property
mutates the realm
variable that was passed in.
The concern would not exist if we were instead doing an API call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point is that do_set_realm_property
does indeed mutate the realm
variable, so there's no need to refresh it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're relying on standard Django here. When you're just testing business logic, you can trust that Django actually persisted stuff to the database without triple-checking that Django did indeed save it by asking it to refresh from the database (which is kind of silly if you don't trust Django in the first place).
You only want to re-fetch DB objects when you're doing stuff like client-server posts where the object clearly gets out of your control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had test bugs in the past where a .save()
was never called in some code path and it resulted in the tests not at all checking the scenario they intended to, or some situation where your local variable was inconsistent with the database. I am sure that at least some of these refresh_from_db
calls were motivated by a real bug of that form.
Here are a few commits to look at: 685fbff, c78d728; haven't reviewed in full.
We can probably rely on do_set_realm_property
to not have that class of problem, but I do want to be careful about just blanket switching to cache fetches everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about all the cases, but at least for the two commits you mentioned above the object for which we call refresh_from_db
is not passed to do_...
function so we need to call refresh_from_db
for them and I expect this to be the true for most cases where we call refresh_from_db
.
@@ -1078,6 +1079,9 @@ def test_add_and_remove_stream_as_default(self) -> None: | |||
stream = self.make_stream("stream", realm=realm) | |||
stream_id = self.subscribe(user_profile, "stream").id | |||
|
|||
def fresh_stream() -> Stream: | |||
return get_stream_by_id_in_realm(stream_id, realm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a test_classes
refresh_stream(stream)
method like what you have for the users -- I don't really like this sort of closure that has "hidden" parameters for what the streams are.
And "fresh_stream" really reads to me like we're going to get a totally new/fresh stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're quibbling about names. The "fresh_stream" name is both accurate and appropriate. You want a totally freshly fetched Stream object from the database to properly validate that the round-trip POST or whatever worked. Otherwise you're gonna get either a false negative or false positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we made refresh_user
, I don't see why we wouldn't want this to be parallel.
Heads up @showell, we just merged some commits that conflict with the changes you made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the |
For all the reasons that we avoid refresh_from_db in our real code, we should also avoid it in tests. It's usually just as easy to re-fetch a new object using the same simple mechanism that allowed you to fetch the original object.
From a test perspective, this means subsequent assertions about query counts, and just behavior in general with respect to things like caches, will be more correct.
Last but not least, this removes a nuisance for us doing an audit of our codebase using django-seal. (See https://chat.zulip.org/#narrow/stream/3-backend/topic/django-seal.20audits/near/1625400 for more context.)