-
-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get and maintain a handy optimized data structure of recent PM conversations #4392
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm eager to get #3133 fixed, and I'm glad to see fetchPrivateMessages
marked for future deletion; it will make doInitialFetch
cleaner. Just a few comments below.
@@ -251,6 +263,7 @@ const fetchPrivateMessages = () => async (dispatch: Dispatch, getState: GetState | |||
numAfter: 0, | |||
foundNewest: found_newest, | |||
foundOldest: found_oldest, | |||
ownUserId: getOwnUserId(getState()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (here and in fetchMessages
, above): From one perspective, this isn't NFC, right? getOwnUserId
will throw with "No server data found" if the user has logged out, which might have happened while the API request was in progress. (Before this commit, that error would not have been thrown.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I don't think it's a very bad thing to throw here; the alternative is allowing race condition where the result of this message fetch is applied when the account it's meant for is no longer logged in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I don't think it's a very bad thing to throw here; the alternative is allowing race condition where the result of this message fetch is applied when the account it's meant for is no longer logged in.
Er, well, of course it's bad in that it would (I think) cause a crash—but having corrupted data is also quite bad and possibly harder to debug. 🙂 This is the kind of thing that we hope to work out with logic for cancelling in-progress requests when one logs out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm true! Yeah, we should really fix this possible case in general by cancelling these requests (#4170), but we haven't yet.
I'll note this in the commit message.
}); | ||
|
||
describe('reducer', () => { | ||
const initialState = reducer(undefined, ({ type: eg.randString() }: $FlowFixMe)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I hadn't thought of this way of getting the initial/previous state, i.e., running a dummy action through the reducer starting from an undefined
state.
It looks like we've often just spelled out the initial/previous state literally, although in my current revision of #4390 I take advantage of our new eg.makeMessagesState
to set the initial/previous state in the tests for messagesReducer
. That approach lets me minimize the diff when we change over to Immutable.Map
for state.messages
, while making it easier to avoid the Immutable.Map({ 1: 'foo' })
(vs. Immutable.Map([[1, 'foo']])
) mistake. I think your approach here would work just as well for both of those things, so I'd be happy to give it a try over there if you'd like. (In that branch there are a few uses of eg.makeMessagesState
in other test files, like unreadHuddlesReducer-test, that should probably stay, to get those advantages.)
In 612ec04, I renamed some initialState
s to prevState
. At first, I might have suggested that we make the same rename here for consistency. But I see that the initialState
here really is kind of an initial state; it's never the variable that gets passed directly to the reducer; that variable is called state
or someState
, with state
sometimes getting reassigned as the test progresses. I think that makes sense as a new pattern to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we lose just a bit of coverage by not calling deepFreeze
on state objects we pass to the reducer; in theory, I suppose, the reducer could get away with improperly mutating the object that holds map
and sorted
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this way of getting the initial/previous state, i.e., running a dummy action through the reducer starting from an
undefined
state.
Yep! One nice thing about this is that it exercises the API that Redux already requires a reducer to support -- in fact it's mimicking what Redux itself does to initialize the state on creating the store.
In 612ec04, I renamed some
initialState
s toprevState
. At first, I might have suggested that we make the same rename here for consistency. But I see that theinitialState
here really is kind of an initial state
Yep. I do mean this as the initial state, in the same sense we use that name in our reducers: a given test case may or may not choose it as the "previous" state for e.g. applying the reducer with some action, but in any case initialState
is the one constant state that is the one the given reducer initializes its state to.
In that commit 612ec04, a few of the things called initialState
happened to actually equal the reducer's initial state, but most of them didn't; they were all local to a particular test, and what they had in common is that they were the starting state for their respective tests. So prevState
was a much better name, to avoid the collision with the reducer's own actual initial state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we've often just spelled out the initial/previous state literally, although in my current revision of #4390 I take advantage of our new
eg.makeMessagesState
to set the initial/previous state in the tests formessagesReducer
. That approach lets me minimize the diff when we change over toImmutable.Map
forstate.messages
, while making it easier to avoid theImmutable.Map({ 1: 'foo' })
(vs.Immutable.Map([[1, 'foo']])
) mistake.
Yeah. One thing I like about this is that it's a step toward making the tests not care about the internals of how we implement a particular part of the state, treating it more as a black box. That's good for being able to change the data structures; I think it also makes the tests more readable and informative, by having them act at a level that's more like how we want to think about what the app is doing: send events and other actions in, and select meaningful data out.
I think your approach here would work just as well for both of those things, so I'd be happy to give it a try over there if you'd like. (In that branch there are a few uses of
eg.makeMessagesState
in other test files, like unreadHuddlesReducer-test, that should probably stay, to get those advantages.)
I think in fact a good way to think about eg.makeMessagesState
may be as a shorthand for starting from the initial state and iterating through the given messages to add them. It's good to have such a shorthand, because it helps keep the tests short and focused on the interesting parts.
It should be exactly equivalent to that alternate version; for pure black-box testing, it could even be implemented that way. But I think it's fine to have it implemented directly, as it is -- it's simple, probably more efficient, and it's one central piece of code to update when we change the representation so it's not a burden there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we lose just a bit of coverage by not calling
deepFreeze
on state objects we pass to the reducer; in theory, I suppose, the reducer could get away with improperly mutating the object that holdsmap
andsorted
.
Yeah. I'm not sure our use of deepFreeze
has ever actually caught a bug -- I think that kind of bug is largely pretty avoidable just by writing the code in the style we're accustomed to writing it in. So the approach I've been inclined to is to use it in central places like exampleData.js
, where one call can cover a large swath of tests (see also 45308e7); and to generally skip it in individual test files, to avoid the clutter.
allUsersById, | ||
ownUserId, | ||
); | ||
if (keyRecipients === null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (keyRecipients === null) {
I'm having a little trouble figuring out under which circumstances keyRecipients
is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what pmKeyRecipientsFromIds
returns if there's a user ID it can't find in the given allUsersById
map. By returning null, it forces (via the type-checker) its caller to deal with it -- either by throwing, or as I've done here by ignoring the conversation and moving on.
I see its jsdoc doesn't mention that, though! I'll add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what
pmKeyRecipientsFromIds
returns if there's a user ID it can't find in the givenallUsersById
map.
True, but I guess I was thinking, why would there be a user ID that can't be found in the given allUsersById
map? 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably shouldn't be possible 🙂 -- there aren't even the scenarios there are for emails, where someone may have changed their email. So probably we should emit a warning to Sentry from pmKeyRecipientsFromIds
in this case.
If it does happen, then at least at this callsite of pmKeyRecipientsFromIds
it seems pretty clear how best to proceed: ignore that one conversation and carry on. If there weren't a clear reasonable way to proceed, then we'd want to throw instead.
} else if (prev >= msgId) { | ||
// The conversation already has a newer message. Do nothing. | ||
return state; | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted a blank line here so that the three cases would be visually separated; but Prettier wants to push them back together again. This was a way of making space... though not a great way, as the //
really diminishes the visual separation.
I'll try just prettier-ignore
ing this section.
} else { | ||
// It wasn't the latest. Just handle the general case. | ||
sorted = sorted.delete(i); // linear time, ouch | ||
sorted = insertSorted(sorted, map, key, msgId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insertSorted
here is linear in writes here, too, at least in the common case where it does an .insert
, I think. Does it equally deserve the "ouch" label at this callsite, then? (Or maybe it's enough that "O(n)" gets a mention in the comments above its definition.)
At least in cases where we do a .delete
followed by an .insert
, I wonder if it would be just a bit more efficient to do a single .update
instead; what do you think?
I suppose that would mean writing code that overlaps with some of what insertSorted
does. But then I wonder if insertSorted
might be pared back so that it's only responsible for finding the proper index. That would also give its interface the nice feature that it's at worst linear only in reads, not in reads and writes (if that matters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in general both of these lines can take linear time.
The common case, though, should be that the message that just came in is the newest message we've ever seen. In fact, for EVENT_NEW_MESSAGE, if it's ever not the newest then something odd has happened. (This could potentially happen if we go fetch the messages in some narrow, and right around the same time someone sends a message into that narrow, and the fetch sees that message and returns sooner than we get the reply from the event queue to tell us about the message. Or a similar scenario with two different messages sent right around the time we fetch: one of them in the fetch results, and a slightly older one that comes as an event shortly after. But I think that kind of race condition is basically the only way it can happen.) Meanwhile for messages we learn about through REALM_INIT or MESSAGE_FETCH_COMPLETE, if we want to optimize those they probably won't be going through this codepath anyway.
If it is the newest message we've ever seen, then the optimization described at the top of insertSorted
applies, and it'll take constant time.
At least in cases where we do a
.delete
followed by an.insert
, I wonder if it would be just a bit more efficient to do a single.update
instead; what do you think?
The trouble is that this bit of code's job (the stanza manipulating sorted
inside this "The conversation needs to be updated" case) is to rearrange the order of the list's elements. The conversation was at some position in the list, and we need to move it to the new position it belongs at. In general those are two different positions -- which also means we need to change which values appear at all the positions in between. So an .update
won't be enough.
It can happen that we end up doing a .delete
followed by .insert
at the same position. In that case an .update
would indeed be enough... but we'd be finding the place where this conversation's key appears, and updating it to write the conversation's key there, i.e. the same value that's already there. So better yet in that case is to do nothing.
That's what this commit toward the end of the branch is about:
16d2d7b recent-pms: Optimize the case of a new message in the latest thread.
It does it for the case where we'd be doing that at position 0, because that's expected to be common. To do it in general would take more logic -- and for this to happen at any other position would mean we're seeing a message that isn't the newest message we've ever seen, which is rare, see above, so it isn't worth adding complexity for.
I'll add some more comments trying to explain all this more than the code currently does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble is that this bit of code's job (the stanza manipulating
sorted
inside this "The conversation needs to be updated" case) is to rearrange the order of the list's elements.
Ohh, OK, right.
I'll add some more comments trying to explain all this more than the code currently does.
I think it's pretty well explained as-is; I just missed the mark here, oops. 🙂 Thanks for explaining. Looking back, I think I may have seen the i
in insert
and the i
in insertSorted
(totally reasonable to give them the same name), and short-circuited into thinking they'd end up being the same value, when of course they sometimes won't be, as you've said.
}); | ||
|
||
if (i === 0) { | ||
return sorted.unshift(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the case where you're saying it takes O(1) time, right?
I don't yet follow how it's O(1). I would expect .unshift
to be O(n): when one new item is inserted at index 0, each later item will have its index increased by one. By contrast, I would expect .push
(where the item gets added at the end, not the beginning) to be O(1) because no pre-existing item would have to have its index rewritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's based on this bit from the Immutable.List docs:
Lists are immutable and fully persistent with O(log32 N) gets and sets, and O(1) push and pop.
Lists implement Deque, with efficient addition and removal from both the end (push, pop) and beginning (unshift, shift).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, got it. That doesn't get a mention at the entry for .unshift
, and I guess I didn't look closely enough at that discussion at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that'd be good to clarify in the Immutable docs 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't get a mention at the entry for
.unshift
, and I guess I didn't look closely enough at that discussion at the top.
Oh, one hint I just noticed there in the .unshift
entry -- this doesn't change that it should be made clearer, but it could be a helpful thing to notice -- is that it says this:
Note:
unshift
can be used inwithMutations
.
In general the things that can be used in withMutations
are the operations that efficiently act on just a small bit of the collection, with cost like O(1) or O(log N), and the things that can't are those that act on potentially the whole collection or a large swath of it. (I think the idea is that if you're doing one of the latter inside withMutations
, then you're getting no benefit from that and might as well do so outside of withMutations
, and this serves as a warning in case you thought you were doing something nice and efficient.) So when you see that note on an Immutable method, that's a signal that that's a cheap local operation and not an expensive whole-collection operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: unshift can be used in withMutations.
Huh, interesting! It makes sense, but I wouldn't have been looking for it if you hadn't mentioned it; I'll keep that in mind. 🙂
We'll start using these shortly, as part of the data structure for `recent_private_conversations`.
We'll use this in a new reducer shortly. There's a case where this can potentially change behavior: the thunk actions that dispatch these actions do so after an `await`, which means that by the time they get to calling `getOwnUserId` (after their HTTP requests have returned) the user may have logged out, and it might throw with "No server data found". This is a case where we already have a worse bug, in that if the user has switched to a *different* account and completed the initial fetch for it by the time this request returns, then we'll be adding these messages to our state to display alongside the messages that belong in that account. (Or try to, anyway: if the user IDs and stream IDs and so on don't exist in the other account's realm, then we'll likely just crash.) We're aware of that bug and intend to fix it, but haven't done the work for it yet. See discussion: zulip#4392 (comment)
…uest. The server added in zulip/zulip#11944 support for providing a summary of the most recent PM conversations that a user was involved in, as part of the `/register` response. This commit adds only the minimum machinery necessary to request this data from the server at `/register` time. [ray: Split original initial commit into two parts, this being the first. Minor changes to Flow types.] [greg: Squashed in a few bits of second commit; updated a comment; shortened commit message.]
This takes the data from `recent_private_conversations` (which we just started requesting in the previous commit) in the initial fetch, and then maintains it as new messages arrive. We make good use of our new support for Immutable.js data structures. Instead of maintaining the data in the exact same format as it came over the wire as JSON, we arrange it on receipt into a data structure that's optimized for efficiently maintaining and using. The organization of the code is slightly different from our usual existing practice: * The state type is defined right in the same file as the reducer which updates it. * Also in that file are some related stateless helper functions: in particular `usersOfKey`, which code elsewhere in the app can use to help interpret the data provided by this model. * The file is named in a general way like "thingModel", rather than our usual "thingReducer", and the reducer is a named rather than default export. I think this is helpful here for keeping together code that logically goes together in that it has shared internal knowledge and invariants, like the format of PmConversationKey strings. I think something similar would be helpful for a lot of our model code (including most of what's in reducers and selectors), but I'm in no rush to do a sweeping refactor; we'll want to experiment first and find the arrangement that works best. This version doesn't put any selector code in this file. The existing selector in the neighboring `pmConversationsSelectors.js` doesn't really belong here, because it mixes together a number of different areas of the app model, notably the unread counts. After some future refactoring to separate that out, there may be a more focused selector that naturally lives here... or it may be that no selector is needed, because it'd just be accessing a property on the state type, and we can simplify by just saying that directly.
In this version, the "modern" implementation is a stub that just invokes the "legacy" implementation. Next, we'll actually write a modern implementation.
We'll want to do the same thing in the upcoming "modern" implementation of this selector (because this is using another data structure unrelated to the one we're modernizing); so factor it out as a function we can call there.
This is only a little shorter in line count; but I think it has a good bit less complexity to take in. It's also doing quite a lot less work: instead of iterating through all the PMs (messages!) we know about, it iterates through just one item per PM *conversation*, in data that we've efficiently maintained as the messages came in. That's the big payoff of this server API feature. Or, relatedly: because this is so much more efficient, we're able to provide much more thorough data. The old way was fed primarily by `fetchPrivateMessages`, where we fetch the latest 100 PMs -- which for someone who uses PMs a lot could leave out conversations as recent as yesterday, and include just a handful of conversations. With current servers, the new way gets initialized from the user's last 1000 PMs. As a result, if you look at the PM-conversations screen with the app before this change and then after it, you'll probably see a lot more conversations. For me on czo, I see about 4x as many, extending far back into the past. Fixes: zulip#3133
This seems likely to be a common case -- probably the majority of the EVENT_NEW_MESSAGE events. We already had it taking constant time (aka O(1) time) thanks to the `i === 0` optimization in `insertSorted`... but we can do better by not mutating `sorted` at all. Also add a test that this optimization is present. In general we can't easily test the performance, or the asymptotic complexity, of our data structures and algorithms... but when the intended behavior is "the new thing is identical to the old thing", that we can.
With the spiffy new recent-pms data structure, this hack is no longer needed!
Thanks for the review! New revision pushed. |
Thanks for the revision and replies! Merged. |
This fixes #3133 -- making a lot more PM conversations visible on the "PM conversations" screen -- by using the nice
recent_private_conversations
data structure that the server introduced in version 2.1, precisely in order to support fixing #3133.This PR replaces #3535, and #4104 which was partly based on a rebase of #3535. There's one commit in here which derives from those two PRs; but upon rebasing #4104 and studying it, I decided it'd be better to take a different approach for most of what it needs to do.
Most fundamentally: instead of maintaining the data in the exact same format as it came over the wire as JSON, we arrange it on receipt into a data structure that's optimized for efficiently maintaining and using. The "store it like it came over the wire" approach is the one we use for a lot of the data in the app... but the needs of an in-memory data structure for efficiently updating data, and efficiently finding things in it, are very different from the needs of a simple, highly portable, wire format like JSON. Our #3339 and #3949 already describe how we should be using more appropriate data structures for a lot of our data.
This data structure is another example of that -- and as we're adding it now from scratch, it makes a perfect occasion to go straight for a data structure designed for the purpose, without any of the additional considerations or work required to migrate something we already have and use.
In doing so, we make good use of our new (since #4201) support for Immutable.js data structures in our Redux state: without Immutable (and short of basically reimplementing large parts of Immutable ourselves), it would have been quite difficult to make these data structures as efficient as they should be in a way compatible with Redux's requirement that the state not be mutated.
This also builds on #4385 yesterday, and on the long series of work sorting out the different ways we identify PM conversations (#4035).
Fixes: #3133