tests: Eliminate refresh_from_db on UserProfile objects #26533

showell · 2023-08-19T21:06:13Z

For all the reasons that we avoid refresh_from_db in our real code, we should also avoid it in tests. It's usually just as easy to re-fetch a new object using the same simple mechanism that allowed you to fetch the original object.

From a test perspective, this means subsequent assertions about query counts, and just behavior in general with respect to things like caches, will be more correct.

Last but not least, this removes a nuisance for us doing an audit of our codebase using django-seal. (See https://chat.zulip.org/#narrow/stream/3-backend/topic/django-seal.20audits/near/1625400 for more context.)

I add a bunch of cute helper methods to make the test a bit more readable. And then I make sure to get clean objects, which precludes the need for our callback functions to refresh the user objects. And finally I make sure that our validation functions don't cause any round trips (assuming we have fetched objects using a standard Zulip helper, which example_user ensures.)

The get_user function is poorly named, but I don't want to sweep the entire codebase yet. It's also nice to have a test wrapper for little experiments like profiling tests or hunting down calls to refresh_from_db. It's possible that we would also just change the new wrapper to more directly call Django. The `get_user` function isn't used in a ton of real-world places, so we might want the test code to just bypass the cache.

The helper here was no longer a useful abstraction.

The `expected` flag was incredibly confusing, as you couldn't tell from the calling code what you were actually expecting to happen. I avoid the context manager idiom in order to force the callers to create simple helper functions, and I de-duplicate some code in some places. I also force the caller to explicitly soft-deactivate the user with one simple line of code, so that the person reading the test doesn't have to research the side effects of the helper. (And I make it very easy for new authors to follow the practice going forward.) This is also somewhat of a prep commit to avoid the obfuscated use of refresh_from_db.

We could actually eliminate all the calls here and still have the events test pass, but I am too timid to track down whether any subtle changes would happen under the surface and somehow escape line coverage. Fortunately, these calls are quite cheap, as I indicate in the comments. In general we want to phase out calls to refresh_from_db when it comes to users. It is better to just re-fetch them using Zulip APIs, especially if there are things like query counts being checked.

We want to avoid refresh_from_db.

The comments about circumventing the cache were misleading, since we DON'T want to circumvent the memcached cache. I think they referred to the Django cache, but even that was kind of strange. I think anybody who touches these tests will figure out why we re-fetched Hamlet.

We want to make sure that we fetch users similar to how real code will fetch users. By using refresh_user instead of Django's refresh_from_db, we ensure that caches are properly invalidated after the prior save operation. The actual motivation for this change is that we also want to use the django-seal project to hunt down places where we make unnecessary round trips to the database, and calling refresh_from_db leads to some false alarms.

We no longer opaquely refresh the user objects in these helpers. The callers don't break here, because none of them do any of their own assertions on the original user object (or they immediately re-assign the user object themselves). We have plenty of precedent in our main soft-deactivation tests that the caller should refresh the user objects from the database after any non-trivial operation.

The adjustment to query count reflects the fact that we have a truly refreshed object from Zulip's layer for getting new UserProfile objects, rather than a Django object who has cleared out some of its cache entries. This saves us a round trip for the Realm object.

We generally want to avoid refresh_from_db.

After changing the realm this way, there is no reason to re-load it. If anything, you would expect things like user objects to get stale, but re-loading the realm wouldn't help them anyway.

Right now this only works for most test users, but that covers probably most likely cases. We are trying to phase out refresh_from_db to allow us to use django-seal in the future.

zerver/tests/test_events.py

timabbott · 2023-08-21T18:53:43Z

zerver/tests/test_bots.py

-        user.refresh_from_db()
+
+        # refresh user
+        user = self.example_user("hamlet")


One theory is that we could have this be calling self.example_user("hamlet", skip_cache=True) or some similar abstraction for cases where we previously did refresh_from_db.

No, again, we actually do want the memcached piece here. We want to fetch the data in the same way that the server would.

zerver/tests/test_signup.py

zerver/lib/test_classes.py

timabbott · 2023-08-21T18:59:46Z

zerver/tests/test_message_send.py

@@ -2768,12 +2768,10 @@ def test_no_topic_message(self) -> None:
        addressee = Addressee.for_stream(stream, topic_name)

        do_set_realm_property(realm, "mandatory_topics", True, acting_user=None)
-        realm.refresh_from_db()

        with self.assertRaisesRegex(JsonableError, "Topics are required in this organization"):
            check_message(sender, client, addressee, message_content, realm)


I think here, again, I do want us to do a refresh; calling check_message(sender, client, addressee, message_content, realm) means that the test may now be relying on the fact that do_set_realm_property mutates the realm variable that was passed in.

The concern would not exist if we were instead doing an API call.

The whole point is that do_set_realm_property does indeed mutate the realm variable, so there's no need to refresh it.

We're relying on standard Django here. When you're just testing business logic, you can trust that Django actually persisted stuff to the database without triple-checking that Django did indeed save it by asking it to refresh from the database (which is kind of silly if you don't trust Django in the first place).

You only want to re-fetch DB objects when you're doing stuff like client-server posts where the object clearly gets out of your control.

We've had test bugs in the past where a .save() was never called in some code path and it resulted in the tests not at all checking the scenario they intended to, or some situation where your local variable was inconsistent with the database. I am sure that at least some of these refresh_from_db calls were motivated by a real bug of that form.

Here are a few commits to look at: 685fbff, c78d728; haven't reviewed in full.

We can probably rely on do_set_realm_property to not have that class of problem, but I do want to be careful about just blanket switching to cache fetches everywhere.

I am not sure about all the cases, but at least for the two commits you mentioned above the object for which we call refresh_from_db is not passed to do_... function so we need to call refresh_from_db for them and I expect this to be the true for most cases where we call refresh_from_db.

timabbott · 2023-08-21T19:00:53Z

zerver/tests/test_subs.py

@@ -1078,6 +1079,9 @@ def test_add_and_remove_stream_as_default(self) -> None:
        stream = self.make_stream("stream", realm=realm)
        stream_id = self.subscribe(user_profile, "stream").id

+        def fresh_stream() -> Stream:
+            return get_stream_by_id_in_realm(stream_id, realm)


I would prefer a test_classes refresh_stream(stream) method like what you have for the users -- I don't really like this sort of closure that has "hidden" parameters for what the streams are.

And "fresh_stream" really reads to me like we're going to get a totally new/fresh stream.

I think we're quibbling about names. The "fresh_stream" name is both accurate and appropriate. You want a totally freshly fetched Stream object from the database to properly validate that the round-trip POST or whatever worked. Otherwise you're gonna get either a false negative or false positive.

If we made refresh_user, I don't see why we wouldn't want this to be parallel.

zerver/lib/test_classes.py

zulipbot · 2023-08-28T15:56:55Z

Heads up @showell, we just merged some commits that conflict with the changes you made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/main branch and resolve your pull request's merge conflicts accordingly.

timabbott · 2023-09-19T00:00:13Z

I merged the first 4 commits as the series ending with a8f5836 as part of reviewing #26567.

zulipbot added the size: XL label Aug 19, 2023

showell added 12 commits August 19, 2023 21:36

tests: Simplify policy-related tests.

d51b699

The helper here was no longer a useful abstraction.

tests: Refresh bot users differently.

2d49af1

We want to avoid refresh_from_db.

tests: Verify email change more rigorously.

5294756

auth tests: Explicitly re-fetch users.

2a30ff0

tests: Use refresh_users in various places.

4652981

showell force-pushed the user_refresh branch from 7901e73 to 3372810 Compare August 19, 2023 21:53

showell added 4 commits August 19, 2023 23:11

tests: Get fresh streams and attachments.

ddb9a74

We generally want to avoid refresh_from_db.

tests: Avoid unnecessary refresh_from_db for realms.

a205ce3

After changing the realm this way, there is no reason to re-load it. If anything, you would expect things like user objects to get stale, but re-loading the realm wouldn't help them anyway.

tests: Refresh stream using Zulip model method.

2021408

tests: Prevent refresh_from_db for user objects.

a0925db

Right now this only works for most test users, but that covers probably most likely cases. We are trying to phase out refresh_from_db to allow us to use django-seal in the future.

showell force-pushed the user_refresh branch from a8aeab7 to a0925db Compare August 19, 2023 23:11

showell requested a review from sahil839 August 20, 2023 11:05

showell added the maintainer review PR is ready for review by Zulip maintainers. label Aug 21, 2023

timabbott reviewed Aug 21, 2023

View reviewed changes

zerver/tests/test_events.py Show resolved Hide resolved

timabbott reviewed Aug 21, 2023

View reviewed changes

zerver/tests/test_signup.py Show resolved Hide resolved

timabbott reviewed Aug 21, 2023

View reviewed changes

zerver/lib/test_classes.py Show resolved Hide resolved

timabbott reviewed Aug 21, 2023

View reviewed changes

zerver/lib/test_classes.py Show resolved Hide resolved

sahil839 mentioned this pull request Aug 24, 2023

Seal UserProfile queries using django-seal. #26567

Open

12 tasks

zulipbot added the has conflicts label Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: Eliminate refresh_from_db on UserProfile objects #26533

tests: Eliminate refresh_from_db on UserProfile objects #26533

showell commented Aug 19, 2023 •

edited

timabbott Aug 21, 2023

showell Aug 21, 2023

timabbott Aug 21, 2023

showell Aug 21, 2023

showell Aug 21, 2023

timabbott Aug 21, 2023

sahil839 Sep 8, 2023

timabbott Aug 21, 2023

showell Aug 21, 2023

timabbott Aug 21, 2023

zulipbot commented Aug 28, 2023

timabbott commented Sep 19, 2023

tests: Eliminate refresh_from_db on UserProfile objects #26533

Are you sure you want to change the base?

tests: Eliminate refresh_from_db on UserProfile objects #26533

Conversation

showell commented Aug 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zulipbot commented Aug 28, 2023

timabbott commented Sep 19, 2023

showell commented Aug 19, 2023 •

edited