Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new dataclass for message send event user data. #18853

Merged
merged 6 commits into from
Jun 21, 2021

Conversation

abhijeetbodas2001
Copy link
Member

This is a prep to fix the bug described in https://chat.zulip.org/#narrow/stream/3-backend/topic/get_active_presence_idle_user_ids.

The first two commits are follow ups to #18719

@abhijeetbodas2001
Copy link
Member Author

I will tomorrow stack on top of this PR more commits to actually start using the message_is_notifiable method which the last commit here adds, but the commits here should be independently mergeable, in case there aren't any review comments.

@@ -949,14 +946,15 @@ def get_client_payload(apply_markdown: bool, client_gravatar: bool) -> Dict[str,
# If the recipient was offline and the message was a single or group PM to them
# or they were @-notified potentially notify more immediately
private_message = message_type == "private" and user_profile_id != sender_id
mentioned = "mentioned" in flags
mentioned = user_data.get("mentioned", False)
stream_push_notify = user_data.get("stream_push_notify", False)
Copy link
Member Author

@abhijeetbodas2001 abhijeetbodas2001 Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this should be something like user_data.get("mentioned", "mentioned" in flags) for backwards compatibility, but I'm not sure if that's worth doing because:

  • It defeats the purpose of deduplicating this logic
  • Tornado won't crash if we don't do that -- the worst that will happen is someone not receiving a notification during upgrade

@timabbott
Copy link
Sponsor Member

I merged all but the last commit as the series ending with b6806fb. Thanks @abhijeetbodas2001! The one thing that I worry a bit about with extending the user_data classes is the size of the object that we send via orjson.dumps from Django to Tornado. I haven't profiled, but it seems very possible that to make sending to 1K users fast, we actually want an architecture where the thing we send is the lists of user IDs that got into creating a UserMessageNotificationsData object, and build the object only inside the Tornado code, especially since most of the lists like stream_email_user_ids are likely very short. (So we get to send 30 integers rather than 1K copies of the string "stream_email_user_ids" as a key in a JSON blob).

For message_is_notifiable, I kinda wonder if we instead want two functions, one for email and one for push, and then to have a message_is_potentially_notifiable that calls both and does an OR on them?

It's potentially useful to distinguish "definitely notifiable" from "potentially notifiable", where the former takes into account the online situation.

Maybe user_message_event.is_potentially_push_notifiable(...) would be a good call signature?

@abhijeetbodas2001
Copy link
Member Author

I did some calculations.

  1. If we pre-calculate and send all the flags to tornado, that's around 190KB for 1000 users. (190B per json object)
  2. If we calculate all those flags in tornado itself, we could bring that down to ~60KB for 1000 users, plus the lists of integers we'd need to send, so that's definitely a significant improvement.
  3. But the downside of 2. is that we'd have a bunch of duplicated code, because we'll still need to calculate these flags pre-send_event call to send to the get_active_presence_idle_user_ids function.

I think it would be nice to have some data once the commits merged today hit chat.zulip.org

@abhijeetbodas2001
Copy link
Member Author

Another idea would be to still calculate these notifications related flags before send_event, but use tuples/arrays to store these them. That'd save the space used for the string keys, and bring down the per-object size to ~100B.

{
    "id": 10,
    "flags": ["mentioned", "wildcard_mentioned"],
    "notification_flags": [true, true, false, false, false, false]
}

And we could then delegate all the array-indices related dirty work to dataclass methods.

@abhijeetbodas2001
Copy link
Member Author

I made these changes suggested to the dataclass method, and updated the PR.

@timabbott
Copy link
Sponsor Member

The test failure is missing coverage on one line.

@abhijeetbodas2001 abhijeetbodas2001 force-pushed the event_queue branch 6 times, most recently from 5d5a7d8 to 1b2ba60 Compare June 17, 2021 14:08
@abhijeetbodas2001
Copy link
Member Author

The test failure is missing coverage on one line.

Oops! I had missed adding a test for the function which does the OR of the functions for email and push. Fixed.

@timabbott
Copy link
Sponsor Member

is that we'd have a bunch of duplicated code

I feel like we could avoid duplicating that code by just calling a shared function in both places.

One thing to highlight is that time spent in Tornado is much more expensive to us than time spent in a Django python process, because of the different horizontal scaling properties of those two systems. And in any case JSON dumping of an extra 100KB is not cheap.

@abhijeetbodas2001 abhijeetbodas2001 force-pushed the event_queue branch 4 times, most recently from dd9349f to ec84d41 Compare June 18, 2021 18:02
@abhijeetbodas2001
Copy link
Member Author

@timabbott I've attempted doing that migration and pushed here. The translation code turned out to be kinda messy, but I'm confident that it's correct, since I ran a bunch of relevant test suite files against just the first commit here.

If all seems good, we can squash the first two commits.

zerver/lib/actions.py Outdated Show resolved Hide resolved
stream_email_user_ids: List[int],
wildcard_mention_user_ids: List[int],
muted_sender_user_ids: List[int],
) -> "UserMessageNotificationsData":
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a dangerous interface, in that calling it in a loop will take quadratic time. We really want to accept Set objects as what is passed into this.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that once we fix this, we can just pass these parameters from the send_request object directly without converting Sets to Lists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to take in Sets

@timabbott
Copy link
Sponsor Member

This looks great; posted a few comments. I think it's also worth doing a bit of before/after profiling of "send a message to a stream with 5000 subscribers", with only 1 of them being active / not long-term-idle, just to make sure we're not regressing the overall performance there (this is certainly a high risk code path). One can probably measure this well enough by watching the development console maybe with a bit of print-debugging when sending 5 messages in quick succession.

@abhijeetbodas2001
Copy link
Member Author

I didn't understand the use case of "only 1 being active". Could you elaborate?

I profiled for both cases -- all 5000 active (hence eligible to receive the message send event) and just one user active.
Here the data, averaged over 5 runs. The time is for the interval between just before the send_event call and just before the process_message_event call. (Time in ms)

5000 active users

Average Max Min
master 14.3 15.1 13
This PR 6.58 8.44 5.12

Only 1 active user

Average Max Min
master 1.06 1.17 0.98
This PR 1.35 1.35 0.97

Hmm that seems weird.
Slight performance degrade when just one user is active?

This is separate from the next commit for ease of testing.
To verify that the compatibility code works correctly, all message send
and event_queue tests from our test suite should pass on just this commit.
Before this commit, we used to pre-calculate flags for user data and send
it to Tornado, like so:
```
{
    "id": 10,
    "flags": ["mentioned"],
    "mentioned": true,
    "online_push_enabled": false,
    "stream_push_notify": false,
    "stream_email_notify": false,
    "wildcard_mention_notify": false,
    "sender_is_muted": false,
}
```

This has the benefit of simplifying the logic in the event_queue code a bit.

However, because we sent such an object for each user receiving the event,
the string keys (like "stream_email_notify") get duplicated in the JSON
blob that is sent to Tornado.

For 1000 users, this data may take up upto ~190KB of space, which can
cause performance degradation in large organisations.

Hence, as an alternative, we send just the list of user_ids fitting
each notification criteria, and then calculate the flags in Tornado.
This brings down the space to ~60KB for 1000 users.

This commit reverts parts of following commits:
- 2179275
- 40cd6b5

We will in the future, add helpers to create `UserMessageNotificationsData`
objects from these lists, so as to avoid code duplication.
We will later consistently use these functions to check for notifiable
messages in the message send and event_queue code.

We have these functions accept the `sender_id` so that we can avoid the
`private_message = message["type"] == "private" and user_id != sender_id`
wizardy.
Since `flags` here could be iterated through multiple times
(to check for push/email notifiability), we use `Collection`.
Inspired by 871e73a.

The other change here in the `event_queue` code is prep for using
the `UserMessageNotificationsData` class there.
* Have the `get_active_presence_idle_user_ids` function look at all the
user data, not just `private_message` and `mentioned`.
* Fix a couple of incorrect `missedmessage_hook` tests, which did not
catch the earlier behaviour.
* Add some comments to the tests for this function for clarity.
* Add a helper to create `UserMessageNotificationsData` objects from the
user ID lists. This will later help us deduplicate code in the event_queue
logic.

This fixes a bug which earlier existed, that if a user turned on stream
notifications, and received a message in that stream which did not mention
them, they wouldn't be in the `presence_idle_users` list, and hence would
never get notifications for that message.

Note that, after this commit, users might still not get notifications in
the above scenarios in some cases, because the downstream logic in the
notification queue consumers sometimes erroneously skips sending
notifications for stream messages.
We use the `UserMessageNotificationsData` methods to check if the
message is notifiable, and use a cleaner early continue pattern.
@abhijeetbodas2001
Copy link
Member Author

@timabbott This is ready for another review. I've:

  • Extracted the data translation code into a function and written a test for it.
  • Change the from_user_ids function to take in Sets Introduce new dataclass for message send event user data. #18853 (comment)
  • Posted profiling data above
  • Added a new commit, which utilizes our new class and its methods in the event_queue code to simplify stuff. We'll also probably want to tweak maybe_enqueue_notifications to take in an instance of out new dataclass instead of individual fields; I'll soon add a commit to do that.

@timabbott
Copy link
Sponsor Member

Sorry for the lack of clarity; for only one user active, I guess what I had in mind is 95% having long_term_idle=True; basically simulating something more like chat.zulip.org. (Or at least essentially all of them presence_idle).

@timabbott timabbott merged commit c6d696c into zulip:master Jun 21, 2021
@timabbott
Copy link
Sponsor Member

This is awesome, merged, thanks @abhijeetbodas2001! I agree with the direction of tweaking maybe_enqueue_notifications to use the dataclass.

The other thought I had after reading this is potentially get_active_presence_idle_user_ids could accept send_request as a parameter, effectively moving the UserMessageNotificationsData.from_user_id_sets call into get_active_presence_idle_user_ids. The main benefit of that change is that it'd reduce the reading complexity of do_send_messages.

There's a related opportunity to pass send_message_request into create_user_messages rather than its individual fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants