-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new dataclass for message send event user data. #18853
Conversation
I will tomorrow stack on top of this PR more commits to actually start using the |
zerver/tornado/event_queue.py
Outdated
@@ -949,14 +946,15 @@ def get_client_payload(apply_markdown: bool, client_gravatar: bool) -> Dict[str, | |||
# If the recipient was offline and the message was a single or group PM to them | |||
# or they were @-notified potentially notify more immediately | |||
private_message = message_type == "private" and user_profile_id != sender_id | |||
mentioned = "mentioned" in flags | |||
mentioned = user_data.get("mentioned", False) | |||
stream_push_notify = user_data.get("stream_push_notify", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this should be something like user_data.get("mentioned", "mentioned" in flags)
for backwards compatibility, but I'm not sure if that's worth doing because:
- It defeats the purpose of deduplicating this logic
- Tornado won't crash if we don't do that -- the worst that will happen is someone not receiving a notification during upgrade
I merged all but the last commit as the series ending with b6806fb. Thanks @abhijeetbodas2001! The one thing that I worry a bit about with extending the For It's potentially useful to distinguish "definitely notifiable" from "potentially notifiable", where the former takes into account the online situation. Maybe |
9192f36
to
73bbbb4
Compare
I did some calculations.
I think it would be nice to have some data once the commits merged today hit chat.zulip.org |
Another idea would be to still calculate these notifications related flags before send_event, but use tuples/arrays to store these them. That'd save the space used for the string keys, and bring down the per-object size to ~100B. {
"id": 10,
"flags": ["mentioned", "wildcard_mentioned"],
"notification_flags": [true, true, false, false, false, false]
} And we could then delegate all the array-indices related dirty work to dataclass methods. |
73bbbb4
to
1eaa70b
Compare
I made these changes suggested to the dataclass method, and updated the PR. |
The test failure is missing coverage on one line. |
5d5a7d8
to
1b2ba60
Compare
Oops! I had missed adding a test for the function which does the |
I feel like we could avoid duplicating that code by just calling a shared function in both places. One thing to highlight is that time spent in Tornado is much more expensive to us than time spent in a Django python process, because of the different horizontal scaling properties of those two systems. And in any case JSON dumping of an extra 100KB is not cheap. |
dd9349f
to
ec84d41
Compare
@timabbott I've attempted doing that migration and pushed here. The translation code turned out to be kinda messy, but I'm confident that it's correct, since I ran a bunch of relevant test suite files against just the first commit here. If all seems good, we can squash the first two commits. |
stream_email_user_ids: List[int], | ||
wildcard_mention_user_ids: List[int], | ||
muted_sender_user_ids: List[int], | ||
) -> "UserMessageNotificationsData": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has a dangerous interface, in that calling it in a loop will take quadratic time. We really want to accept Set
objects as what is passed into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that once we fix this, we can just pass these parameters from the send_request
object directly without converting Sets to Lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to take in Set
s
This looks great; posted a few comments. I think it's also worth doing a bit of before/after profiling of "send a message to a stream with 5000 subscribers", with only 1 of them being active / not long-term-idle, just to make sure we're not regressing the overall performance there (this is certainly a high risk code path). One can probably measure this well enough by watching the development console maybe with a bit of print-debugging when sending 5 messages in quick succession. |
ec84d41
to
12e7fa7
Compare
I didn't understand the use case of "only 1 being active". Could you elaborate? I profiled for both cases -- all 5000 active (hence eligible to receive the message send event) and just one user active. 5000 active users
Only 1 active user
Hmm that seems weird. |
This is separate from the next commit for ease of testing. To verify that the compatibility code works correctly, all message send and event_queue tests from our test suite should pass on just this commit.
Before this commit, we used to pre-calculate flags for user data and send it to Tornado, like so: ``` { "id": 10, "flags": ["mentioned"], "mentioned": true, "online_push_enabled": false, "stream_push_notify": false, "stream_email_notify": false, "wildcard_mention_notify": false, "sender_is_muted": false, } ``` This has the benefit of simplifying the logic in the event_queue code a bit. However, because we sent such an object for each user receiving the event, the string keys (like "stream_email_notify") get duplicated in the JSON blob that is sent to Tornado. For 1000 users, this data may take up upto ~190KB of space, which can cause performance degradation in large organisations. Hence, as an alternative, we send just the list of user_ids fitting each notification criteria, and then calculate the flags in Tornado. This brings down the space to ~60KB for 1000 users. This commit reverts parts of following commits: - 2179275 - 40cd6b5 We will in the future, add helpers to create `UserMessageNotificationsData` objects from these lists, so as to avoid code duplication.
We will later consistently use these functions to check for notifiable messages in the message send and event_queue code. We have these functions accept the `sender_id` so that we can avoid the `private_message = message["type"] == "private" and user_id != sender_id` wizardy.
12e7fa7
to
6b7031a
Compare
Since `flags` here could be iterated through multiple times (to check for push/email notifiability), we use `Collection`. Inspired by 871e73a. The other change here in the `event_queue` code is prep for using the `UserMessageNotificationsData` class there.
* Have the `get_active_presence_idle_user_ids` function look at all the user data, not just `private_message` and `mentioned`. * Fix a couple of incorrect `missedmessage_hook` tests, which did not catch the earlier behaviour. * Add some comments to the tests for this function for clarity. * Add a helper to create `UserMessageNotificationsData` objects from the user ID lists. This will later help us deduplicate code in the event_queue logic. This fixes a bug which earlier existed, that if a user turned on stream notifications, and received a message in that stream which did not mention them, they wouldn't be in the `presence_idle_users` list, and hence would never get notifications for that message. Note that, after this commit, users might still not get notifications in the above scenarios in some cases, because the downstream logic in the notification queue consumers sometimes erroneously skips sending notifications for stream messages.
We use the `UserMessageNotificationsData` methods to check if the message is notifiable, and use a cleaner early continue pattern.
6b7031a
to
a18ae36
Compare
@timabbott This is ready for another review. I've:
|
Sorry for the lack of clarity; for only one user active, I guess what I had in mind is 95% having |
This is awesome, merged, thanks @abhijeetbodas2001! I agree with the direction of tweaking The other thought I had after reading this is potentially There's a related opportunity to pass |
This is a prep to fix the bug described in https://chat.zulip.org/#narrow/stream/3-backend/topic/get_active_presence_idle_user_ids.
The first two commits are follow ups to #18719