Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

events: Add basic downgradeable event queue support. #12926

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

timabbott
Copy link
Sponsor Member

This PR is a prototype implementation of downgradeable event queues, for use by the mobile app.

The model is that if a client includes in its /register request downgradeable=true, we will generate a downgradeable event queue. After queue_lifespan_secs of inactivity (and a GC run), the queue will be automatically downgraded to stop tracking high traffic events, which is currently:

  • full message events
  • presence events (which only exist to hot-update for a user having just logged back in); I assume we'll just do a full presence refetch from the client.

It instead just keeps the data from those message events required to, should the client return, add a special event to the queue of type unread_data containing a JSON structure in the same format as the unread_msgs part of /register, reusing the functions originally written to manage those. The client is expected to:

  • Apply any normal events in the queue
  • Apply the special unread_msgs event to its data structures tracking unread messages. Right now this should mostly work (though the client is responsible for dealing with muted streams+topics, and I haven't implemented unread mention tracking yet).

@gnprice I'd love your feedback on this.

@zulipbot
Copy link
Member

zulipbot commented Aug 6, 2019

Heads up @timabbott, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

@gnprice
Copy link
Member

gnprice commented Aug 6, 2019

Thanks @timabbott !

  1. The use of the unread_msgs data structure sounds a lot like basically a way of organizing the "bare messages" we'd discussed before. Does that sound right to you? Or is there a semantic difference as you see it in what information they convey?

That re-organizing seems fine -- potentially helpful for the client, and certainly seems to enable some code reuse on the server. I'm curious if there's another difference there that potentially would be an interesting design point to discuss.

  1. I'm not sure if I understand this comment:

         # We ignore `update_message` events; the client is
         # responsible for updating its data structures for those.
    

It looks like what the code does is let those messages queue up just as usual. That certainly means that when they refer to a message the client already has, the client will handle them as usual once it sees them.

But it'll also be the case that some update_message events are about messages that themselves came into the downgraded queue, and so the proper place to apply them is to that unread_msgs data structure. In that case it seems cleanest to have the server go ahead and do that.

  1. On the downgrade lifecycle, I think we may want to at some point have the client occasionally poll for background updates while leaving the queue in a downgraded state. For example we might have the server send low-priority FCM messages, or APNs background update notifications, to let the app know it'd be a good time for it to poll; then when the user later goes and opens the app, the data can be much fresher.

That's likely best as an opt-in flag on the /events request, though, so we don't need that bit of API in v1.

@timabbott
Copy link
Sponsor Member Author

  1. That re-organizing seems fine -- potentially helpful for the client, and certainly seems to enable some code reuse on the server. I'm curious if there's another difference there that potentially would be an interesting design point to discuss.

I think the risk of this data structure approach is that it's only good for updating the unread_msgs data structures. In particular, because we're dropping read things from the cache, we don't have the data needed to tell the mobile app to invalidate data on what are the latest messages in narrows in the future.

To do that well, we'd likely need to improve the data structure we currently have (with message_id -> stream/topic data) to be a more robust structure with pointers in both directions so that we can inexpensively maintain for a stream/topic pair whether it's been invalidated. Probably wouldn't be hard to do as an extension to the format here by just adding additional keys in the output format, so maybe it doesn't impact the design. (I think it's OK if we invalidate cached narrows for views that had a message added and then moved by editing, or something). So I think you're right that we should just have this code handle update_message as well.

Do you think if we did that tweak, this would provide the app with all the information it'd need about messages it missed? To summarize, we'd have:

  • The data required to update the unread_msgs data structure, in the form of a block of things to add as new, unread messages, plus the virtual_events based message flags event for marking any pre-existing unread messages as read.
  • Full events for edited and deleted messages, which can be used to invalidate local caches. Could in the future be replaced with a message_cache_delete object containing just a list of message IDs to further optimize space, but probably not required. In our current proposed design, these would have already been applied to the unread_msgs data structures.
  • No direct notification of new-but-already-read messages, but potentially in the future something that lets the app know that it can't trust certain narrows to be up to date.

One thing that's on my mind is how to write tests for the client-side logic required, since I think it'd be really painful to be trying to track down bugs in the client-side handling of this manually. One model would be to do a reference implementation of the unread_msgs data structure update logic in Python to make sure it's possible to do that correctly, we could write some pretty slick unit tests of verifying that logic by doing it before-and-after a bunch of state transitions in the unit test here. And then we could maybe have a mode for that unit test to record in a file the data of initial_unread_msgs, events, final_unread_msgs tuples that could be used for a mobile unit test suite. Not sure whether that's overkill or a good idea.

  1. I'm not sure if I understand this comment:

For the update_message events, my thinking was that (1) we probably need to include those events unmodified for the app's use in updating its caches (though I suppose we could do a much more limited "delete this message ID from all caches" approach) and (2) the mobile app presumably already has code for updating its unread count data structures after topic edits.

That's likely best as an opt-in flag on the /events request, though, so we don't need that bit of API in v1.

Agreed.

In terms of a path forward for this PR, I think it's more or less mergable to make it easy to do mobile development against chat.zulip.org for working on this. (And to avoid merge conflicts, etc.).

The main risk with doing so is that this could end up in a release with buggy internals or a bad interface, making us do more annoying feature-detection for future app usage. So I'd want to tweak this to have the server allowing the queue to be setup this way sit behind a feature-flag that we only enable in development and on chat.zulip.org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants