Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presence last_update_id API #29999

Merged
merged 3 commits into from
Jun 7, 2024
Merged

Conversation

mateuszmandera
Copy link
Contributor

@mateuszmandera mateuszmandera commented May 7, 2024

Implements a prototype for the API discussed in https://chat.zulip.org/#narrow/stream/378-api-design/topic/presence.20rewrite/near/1586819

Commit 1 Adds the necessary last_update_id column and PresenceSequence model, commit message elaborates on the migration plan
Commit 2 The backend+frontend parts of the system using the above data structure.

This is WIP, so no tests and only tested in the webapp.
What this has:

  • Adds the last_update_id param and plumbing in the API through the backend and frontend
  • Implements what is intended to be an efficient locking mechanism inside do_update_user_presence, updating PresenceSequence and the related UserPresence update in a single SQL query, at the very end of the transaction, ensuring the hot lock only gets held as briefly as possible.
  • Verified the core mechanism seems to work; the webapp only fetches the recently updated presence data and updates its buddy list with the incremental info

What this does NOT have:

zerver/models/presence.py Outdated Show resolved Hide resolved
zerver/views/presence.py Outdated Show resolved Hide resolved
@@ -47,6 +47,7 @@ def events_register_backend(
"client_gravatar", default=None, json_validator=check_bool
),
slim_presence: bool = REQ(default=False, json_validator=check_bool),
presence_last_update_id: Optional[int] = REQ(default=None, json_validator=check_int),
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we expect this to be used in GET /events? I guess a potential use case would be for mobile/terminal clients to just cache what they had the last presence state for the user, so it could be good, but maybe worth somet discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah the idea was in case the mobile app wanted to use this; and also in case we were changing presence events format based on whether the client is last_update_id-based. But we're not changing events right now, so no point perhaps. Removed this for now.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a feature to have mobile be able to just used cached data on this detail is probably a good idea, so worth bringing it up in API design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zerver/lib/presence.py Outdated Show resolved Hide resolved
zerver/lib/presence.py Outdated Show resolved Hide resolved
@mateuszmandera mateuszmandera force-pushed the presence-new-api branch 6 times, most recently from b5d5adf to 6798bfa Compare May 12, 2024 19:07
@mateuszmandera
Copy link
Contributor Author

mateuszmandera commented May 12, 2024

Responded to comments and pushed a commit with a proposal for handling the main race:

the discussed case of getting presence info about a user the webapp doesn't know yet (https://chat.zulip.org/#narrow/stream/378-api-design/topic/presence.20rewrite/near/1586828)

Will start working on test coverage next.

web/src/presence.ts Outdated Show resolved Hide resolved
@mateuszmandera mateuszmandera force-pushed the presence-new-api branch 2 times, most recently from 4bba759 to 9a2ba23 Compare May 19, 2024 21:45
@mateuszmandera mateuszmandera force-pushed the presence-new-api branch 3 times, most recently from c6b26b6 to c1b08cf Compare May 24, 2024 01:29
@timabbott
Copy link
Sponsor Member

I opened #30283 to integrate the migrations part of this, with the following changes.

diff --git a/zerver/migrations/0526_user_presence_backfill_last_update_id_to_0.py b/zerver/migrations/0526_user_presence_backfill_last_update_id_to_0.py
index 7083776df6..babf436730 100644
--- a/zerver/migrations/0526_user_presence_backfill_last_update_id_to_0.py
+++ b/zerver/migrations/0526_user_presence_backfill_last_update_id_to_0.py
@@ -17,12 +17,14 @@ def backfill_user_presence_last_update_id(
     BATCH_SIZE = 10000
     lower_bound = 0
 
-    # Make sure we run past the end in case of new rows created while we run.
+    # Add a slop factor to make it likely we run past the end in case
+    # of new rows created while we run. The next step will fail to
+    # remove the null possibility if we race, so this is safe.
     max_id += BATCH_SIZE / 2
 
     while lower_bound < max_id:
         UserPresence.objects.filter(
-            id__gt=lower_bound, id__lte=lower_bound + BATCH_SIZE, last_update=None
+            id__gt=lower_bound, id__lte=lower_bound + BATCH_SIZE, last_update_id=None
         ).update(last_update_id=0)
         lower_bound += BATCH_SIZE
 
diff --git a/zerver/migrations/0527_presencesequence.py b/zerver/migrations/0527_presencesequence.py
index 222fd3b9c0..76a172c06c 100644
--- a/zerver/migrations/0527_presencesequence.py
+++ b/zerver/migrations/0527_presencesequence.py
@@ -20,7 +20,10 @@ def create_presence_sequence_for_old_realms(
     BATCH_SIZE = 2000
     lower_bound = 0
 
-    # Make sure we run past the end in case of new rows created while we run.
+    # Add a slop factor to make it likely we run past the end in case
+    # of new rows created while we run. Races with realm creation are
+    # pretty unlikely, and should throw an exception, so we should
+    # catch them.
     max_id += BATCH_SIZE / 2
 
     while lower_bound < max_id:

@alexmv
Copy link
Collaborator

alexmv commented Jun 5, 2024

The first commit says This builds on top of the previous commit which is no longer the case, since the commit in question was merged in #30283, and there have been other things merged since then.

@mateuszmandera mateuszmandera force-pushed the presence-new-api branch 3 times, most recently from 2eaf7a3 to 09e24ca Compare June 5, 2024 20:30
@mateuszmandera
Copy link
Contributor Author

Squashed and split thing into two deployable commits - first just backend API support, and then the frontend actually using it.

The first commit says This builds on top of the previous commit which is no longer the case, since the commit in question was merged in #30283, and there have been other things merged since then.

Thanks, fixed that message to refer that commit specifically by its hash.

@@ -22,15 +22,19 @@ format used by the Zulip server that they are interacting with.

**Feature level 263**:

* `POST /users/me/presence`: When `slim_presence` is used, a new
`presence_last_update_id` parameter can be given, instructing
* `POST /users/me/presence`: A new `last_update_id`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a small bug in this doc btw, the param is called last_update_id for this endpoint. (the prefix would be redundant here, since obviously it's in the presence domain so to speak; there isn't other non-presence data included here that would require such disambiguation via prefixes)

@mateuszmandera
Copy link
Contributor Author

Pushed a commit tweaking the details for last_update_id+slim_presence interactions a bit, to fit https://chat.zulip.org/#narrow/stream/378-api-design/topic/presence.20rewrite/near/1817372 It should be squashed into the backed implementation commit, just leaving it separate to easily show what's changed.

@timabbott
Copy link
Sponsor Member

Squashed the backend/API commits to merge as #30343.

@timabbott
Copy link
Sponsor Member

Probably we can actually merge the rest of this too, I'll come back to that tomorrow.

@mateuszmandera
Copy link
Contributor Author

There was a small bug where for a zephyr mirror realm, we'd return last_update_id=None. Pushed a fix to make it -1

Note: This involves adding presence info of unknown users to the
presence data.
With some small tweaks, we can just add the info to the presence data
structures, just making sure the buddy list correctly skips those
entries and that we redraw the user in the case where the user creation
event arrives after the presence polling loop.
Sending last_update_id implies using slim_presence=true, so sending the
extra param is redundant now.
This was a bug, where in the realm.presence_disabled (synonymous to
being a zephyr mirror realm) case we would return None. We have decided
on the convention of using only integers here, and -1 representing lack
of data.
@timabbott timabbott merged commit f0ec4f5 into zulip:main Jun 7, 2024
7 checks passed
@timabbott
Copy link
Sponsor Member

Merged, thanks @mateuszmandera!

mateuszmandera added a commit to mateuszmandera/zulip that referenced this pull request Jun 8, 2024
This was discussed in the review of zulip#29999:
zulip#29999 (comment)

The previous way of handling wasn't entirely correct, as unnecessary
events were omitted, with a bad guarantee of even being in the correct
order.

This is an improvement as now the function detects that it ended up
doing nothing and can skip sending an event.

The race condition is hard to make up in an automated test, so the test
this commit adds is pretty weak, but possibly the best we can reasonably
do here. Aside of that, the actual race was simulated in manual testing
to verify the expected behavior.
mateuszmandera added a commit to mateuszmandera/zulip that referenced this pull request Jun 8, 2024
This was discussed in the review of zulip#29999:
zulip#29999 (comment)

The previous way of handling wasn't entirely correct, as unnecessary
events were omitted, with a bad guarantee of even being in the correct
order.

This is an improvement as now the function detects that it ended up
doing nothing and can skip sending an event.

The race condition is hard to make up in an automated test, so the test
this commit adds is pretty weak, but possibly the best we can reasonably
do here. Aside of that, the actual race was simulated in manual testing
to verify the expected behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api area: performance Performance and scalability issues integration review Added by maintainers when a PR may be ready for integration. size: XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve presence system's scalability for large realms
5 participants