Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Eliminate refresh_from_db on UserProfile objects #26533

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

showell
Copy link
Contributor

@showell showell commented Aug 19, 2023

For all the reasons that we avoid refresh_from_db in our real code, we should also avoid it in tests. It's usually just as easy to re-fetch a new object using the same simple mechanism that allowed you to fetch the original object.

From a test perspective, this means subsequent assertions about query counts, and just behavior in general with respect to things like caches, will be more correct.

Last but not least, this removes a nuisance for us doing an audit of our codebase using django-seal. (See https://chat.zulip.org/#narrow/stream/3-backend/topic/django-seal.20audits/near/1625400 for more context.)

I add a bunch of cute helper methods to make
the test a bit more readable.

And then I make sure to get clean objects,
which precludes the need for our callback
functions to refresh the user objects.

And finally I make sure that our validation
functions don't cause any round trips (assuming
we have fetched objects using a standard
Zulip helper, which example_user ensures.)
The get_user function is poorly named, but I don't want to
sweep the entire codebase yet.

It's also nice to have a test wrapper for little experiments
like profiling tests or hunting down calls to refresh_from_db.

It's possible that we would also just change the new wrapper
to more directly call Django. The `get_user` function isn't
used in a ton of real-world places, so we might want the test
code to just bypass the cache.
The helper here was no longer a useful abstraction.
The `expected` flag was incredibly confusing, as you
couldn't tell from the calling code what you were
actually expecting to happen.

I avoid the context manager idiom in order to force
the callers to create simple helper functions, and
I de-duplicate some code in some places.

I also force the caller to explicitly soft-deactivate
the user with one simple line of code, so that the
person reading the test doesn't have to research
the side effects of the helper. (And I make it
very easy for new authors to follow the practice
going forward.)

This is also somewhat of a prep commit to avoid
the obfuscated use of refresh_from_db.
We could actually eliminate all the calls here and still
have the events test pass, but I am too timid to track
down whether any subtle changes would happen under the
surface and somehow escape line coverage. Fortunately,
these calls are quite cheap, as I indicate in the comments.

In general we want to phase out calls to refresh_from_db
when it comes to users. It is better to just re-fetch
them using Zulip APIs, especially if there are things
like query counts being checked.
We want to avoid refresh_from_db.
The comments about circumventing the cache were misleading,
since we DON'T want to circumvent the memcached cache. I
think they referred to the Django cache, but even that was
kind of strange. I think anybody who touches these tests
will figure out why we re-fetched Hamlet.
We want to make sure that we fetch users similar to how
real code will fetch users. By using refresh_user instead
of Django's refresh_from_db, we ensure that caches are
properly invalidated after the prior save operation.

The actual motivation for this change is that we also
want to use the django-seal project to hunt down places
where we make unnecessary round trips to the database,
and calling refresh_from_db leads to some false alarms.
We no longer opaquely refresh the user objects in these
helpers.

The callers don't break here, because none of them do any
of their own assertions on the original user object (or
they immediately re-assign the user object themselves).

We have plenty of precedent in our main soft-deactivation
tests that the caller should refresh the user objects from
the database after any non-trivial operation.
The adjustment to query count reflects the fact that we
have a truly refreshed object from Zulip's layer for
getting new UserProfile objects, rather than a Django
object who has cleared out some of its cache entries.
This saves us a round trip for the Realm object.
We generally want to avoid refresh_from_db.
After changing the realm this way, there is no reason
to re-load it. If anything, you would expect things
like user objects to get stale, but re-loading the
realm wouldn't help them anyway.
Right now this only works for most test users, but
that covers probably most likely cases.

We are trying to phase out refresh_from_db to
allow us to use django-seal in the future.
@showell showell requested a review from sahil839 August 20, 2023 11:05
@showell showell added the maintainer review PR is ready for review by Zulip maintainers. label Aug 21, 2023
user.refresh_from_db()

# refresh user
user = self.example_user("hamlet")
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One theory is that we could have this be calling self.example_user("hamlet", skip_cache=True) or some similar abstraction for cases where we previously did refresh_from_db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, again, we actually do want the memcached piece here. We want to fetch the data in the same way that the server would.

@@ -2768,12 +2768,10 @@ def test_no_topic_message(self) -> None:
addressee = Addressee.for_stream(stream, topic_name)

do_set_realm_property(realm, "mandatory_topics", True, acting_user=None)
realm.refresh_from_db()

with self.assertRaisesRegex(JsonableError, "Topics are required in this organization"):
check_message(sender, client, addressee, message_content, realm)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here, again, I do want us to do a refresh; calling check_message(sender, client, addressee, message_content, realm) means that the test may now be relying on the fact that do_set_realm_property mutates the realm variable that was passed in.

The concern would not exist if we were instead doing an API call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point is that do_set_realm_property does indeed mutate the realm variable, so there's no need to refresh it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're relying on standard Django here. When you're just testing business logic, you can trust that Django actually persisted stuff to the database without triple-checking that Django did indeed save it by asking it to refresh from the database (which is kind of silly if you don't trust Django in the first place).

You only want to re-fetch DB objects when you're doing stuff like client-server posts where the object clearly gets out of your control.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had test bugs in the past where a .save() was never called in some code path and it resulted in the tests not at all checking the scenario they intended to, or some situation where your local variable was inconsistent with the database. I am sure that at least some of these refresh_from_db calls were motivated by a real bug of that form.

Here are a few commits to look at: 685fbff, c78d728; haven't reviewed in full.

We can probably rely on do_set_realm_property to not have that class of problem, but I do want to be careful about just blanket switching to cache fetches everywhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about all the cases, but at least for the two commits you mentioned above the object for which we call refresh_from_db is not passed to do_... function so we need to call refresh_from_db for them and I expect this to be the true for most cases where we call refresh_from_db.

@@ -1078,6 +1079,9 @@ def test_add_and_remove_stream_as_default(self) -> None:
stream = self.make_stream("stream", realm=realm)
stream_id = self.subscribe(user_profile, "stream").id

def fresh_stream() -> Stream:
return get_stream_by_id_in_realm(stream_id, realm)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a test_classes refresh_stream(stream) method like what you have for the users -- I don't really like this sort of closure that has "hidden" parameters for what the streams are.

And "fresh_stream" really reads to me like we're going to get a totally new/fresh stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're quibbling about names. The "fresh_stream" name is both accurate and appropriate. You want a totally freshly fetched Stream object from the database to properly validate that the round-trip POST or whatever worked. Otherwise you're gonna get either a false negative or false positive.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we made refresh_user, I don't see why we wouldn't want this to be parallel.

@zulipbot
Copy link
Member

Heads up @showell, we just merged some commits that conflict with the changes you made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/main branch and resolve your pull request's merge conflicts accordingly.

@timabbott
Copy link
Sponsor Member

I merged the first 4 commits as the series ending with a8f5836 as part of reviewing #26567.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has conflicts maintainer review PR is ready for review by Zulip maintainers. size: XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants