Clarify startEventPolling, fix a race condition #3830

chrisbobbe · 2020-01-22T02:05:39Z

This PR follows a long discussion at and around https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/Event.20polling.20optimization.3F/near/810784.

chrisbobbe · 2020-01-22T02:08:13Z

Note the code comment in a56996c raising some possible causes of the crash observed in #3706.

chrisbobbe · 2020-01-22T02:12:43Z

Also, for the authEquivalent function, note that it returns false if auth1 or auth2 is missing. However, it's up to the caller to assign meaning, if desired, to whether or not these auths are present. There is probably a more responsible way to handle this, but I'm not quite sure what it is. The function must still return something if one or both of the auths are missing, but sometimes an undefined auth carries meaning (usually, that the account is not logged in), and sometimes, I suppose, it might be undefined by mistake — but authEquivalent can't tell the difference.

rk-for-zulip · 2020-01-24T02:50:07Z

The function must still return something if one or both of the auths are missing, but sometimes an undefined auth carries meaning (usually, that the account is not logged in), and sometimes, I suppose, it might be undefined by mistake — but authEquivalent can't tell the difference.

Reflexive, ill-considered opinion: in this context, undefined === undefined. Structurally, in a more strongly-typed language, we'd probably be representing the output of tryGetAuth as Maybe Auth or Option<Auth>; semantically, I don't think there's any need for there to be different kinds of "not logged in" states.

chrisbobbe · 2020-01-24T09:32:08Z

semantically, I don't think there's any need for there to be different kinds of "not logged in" states.

Mmm, I think I chose to put it in too general terms; I don’t mean to propose a new state, to be represented by null or something.

Maybe the specific thing that puzzles me is that my design of authEquivalent works together with tryGetAuth’s specific behavior of returning undefined when the user is logged out, so that if a user has logged out, authEquivalent will return false and we can break the loop. So maybe my discomfort is that this is kind of fragile; not only does it depend on tryGetAuth’s current behavior, but that tryGetAuth will be used to produce the arguments in the first place. (And maybe this point is moot: I think false will always be a reasonable return value if exactly one of the auths is undefined, regardless of why it’s undefined. It’ll currently also return false if they’re both undefined, which, why would anyone do that?)

Or maybe (and each “maybe” should put emphasis on all the preceding ones), it’s that the naming of authEquivalent kind of suggests to me that two Auth objects will definitely be passed. Not only have I broken such a contract, but I’ve assigned meaning to the breach: an undefined means the user is not logged in, as noted above.

Another theory (and yes, this has apparently become an investigation into my own subconscious that you’re free to skim past, but you might spot habits of mind that can be improved) is this: The framing of this authEquivalent function as a symmetric, all-purpose equality check between any two Auth objects in the world, in support of which I’ve named the params auth1 and auth2, is disingenuous because the only values for auth1 and auth2 we’d ever need to pass are for “the active account” (obtained somehow) at two different points in time. All other uses are pointless. (I don’t need to confirm that the active account is identical to an inactive account, or that two inactive accounts are identical. Maybe check my reasoning here.) Since this function is destined only to check the login state of the active account across time, maybe we should write it that way — it could take previousAuth and currentAuth, or maybe just a single argument maybeStaleAuth, and do a tryGetAuth internally.

rk-for-zulip · 2020-01-25T01:26:21Z

Not only have I broken such a contract, but I’ve assigned meaning to the breach: an undefined means the user is not logged in, as noted above.

(To be fair, this is also exactly the meaning you've inherited from tryGetAuth.)

Or maybe [...], it’s that the naming of authEquivalent kind of suggests to me that two Auth objects will definitely be passed.

That suggests a different fix, involving bypassing the whole question: actually require two Auth objects (authEquivalent(auth1: Auth, auth2: Auth): boolean), and let the call site interpret what "not having an Auth" means.

Since this function is destined only to check the login state of the active account across time, maybe we should write it that way — it could take previousAuth and currentAuth, or maybe just a single argument maybeStaleAuth, and do a tryGetAuth internally.

isCurrentLoginAuth: (state: GlobalState, auth: Auth) => boolean (by whatever name) sounds reasonable.

(Conceptually, you could then generalize it to take an equivalence relation (Auth, Auth) => boolean, relative to which auth "is" the current login-Auth. But that would be silly.)

The rest of that last paragraph I choose to summarize and generalize as "If an abstraction raises questions whose answers are functionally irrelevant, it probably isn't the right abstraction." Which is, at the very least, a proposition I can't trivially come up with counterexamples for.

gnprice · 2020-01-28T02:32:59Z

we'd probably be representing the output of tryGetAuth as Maybe Auth or Option<Auth>

I think of it basically exactly this way. In the type system we have, it's spelled Auth | void. I think in our code almost anyplace something returns, or passes, a type Foo | void or Foo | null, it's playing semantically the role of Option<Foo>: the value is either a Foo, or a unique value meaning "no Foo here". The latter is equal to itself, but not equal to any Foo.

One reason I worry when I see ?Foo, aka Foo | void | null, is that generally that's also trying to play semantically the role of Option<Foo>... but now it's an Option where there are two different "no Foo here" values, which are unequal (under ===). Usually there isn't a clear semantic difference they express; when there is, it should probably be expressed in some more explicit way.

(In fact the best use case I can think of right now for something like Foo | void | null is as basically a way of expressing Option<Option<Foo>>: you have something that's already Option<Foo>, and then a context where you might have a value of that type, or a lack of a value. For example, Map#get on a Map<string, number | null>.)

I think this function's signature can treat its inputs as the moral equivalent of Option<Auth> in the same way: take two parameters of type Auth | void. One implication is that the "no Auth here" value, spelled undefined, is equal/equivalent to itself.

That also helps keep its description from having to get entangled with the details of other functions like tryGetAuth, or concepts like "not logged in".

gnprice · 2020-01-28T03:00:25Z

Maybe the specific thing that puzzles me is that my design of authEquivalent works together with tryGetAuth’s specific behavior of returning undefined when the user is logged out, so that if a user has logged out, authEquivalent will return false and we can break the loop. So maybe my discomfort is that this is kind of fragile; not only does it depend on tryGetAuth’s current behavior, but that tryGetAuth will be used to produce the arguments in the first place.

Yeah, so thinking of this type as a way of writing Option<Auth> lets us resolve this fragility.

It's no longer this function's job to worry about the specific concept of "the user is logged out", or the specific behavior of tryGetAuth -- it just has an Auth, or an absence-of-an-Auth, and handles those in the natural ways. (Including "absence === absence".) Similarly, tryGetAuth returns the absence-of-Auth value in the cases where that's the natural thing to do.

Then it's up to the callers, if they pass data from one to the other, to make sure the semantics line up appropriately. In this case I think each side doing the most natural thing does line up nicely -- which is what one hopes for from the natural thing. 😉

PS - for background, here's descriptions of Option and equivalents in a couple of languages that have good docs on it, and where the concept is a good match for what I and I think Ray have in mind:
https://docs.swift.org/swift-book/LanguageGuide/TheBasics.html#ID330
https://fsharpforfunandprofit.com/posts/the-option-type/
https://reasonml.github.io/docs/en/null-undefined-option

chrisbobbe · 2020-01-28T19:57:03Z

(To be fair, this is also exactly the meaning you've inherited from tryGetAuth.)

In this case I think each side doing the most natural thing does line up nicely -- which is what one hopes for from the natural thing. 😉

This makes so much sense! Thanks for the explanations, both of you, and I'll read those docs this afternoon! 🙂

chrisbobbe · 2020-01-30T19:51:22Z

it just has an Auth, or an absence-of-an-Auth, and handles those in the natural ways. (Including "absence === absence".)

Hmm, so if auth1 and auth2 are both undefined, it should return true, right, and if callers don't want that behavior, they can adjust locally. So, the current !!auth1 && !!auth2 check is not what we want, right:

export const authEquivalent = (auth1: Auth | void, auth2: Auth | void): boolean =>
  !!auth1 && !!auth2 && auth1.apiKey === auth2.apiKey && keyOfAuth(auth1) === keyOfAuth(auth2);

I wonder what's the best way to implement the natural way of "absence === absence" being true in JavaScript. Do you think there's a more clear and concise way than this?

   auth1 !== undefined && auth2 !== undefined
    ? auth1.apiKey === auth2.apiKey && keyOfAuth(auth1) === keyOfAuth(auth2)
    : auth1 === undefined && auth2 === undefined;

chrisbobbe · 2020-02-12T19:13:42Z

If it's important to have tests for authEquivalent (none of the other functions in accountMisc.js have them; they're all pretty small), I'd like to wait for #3839, which makes some changes to makeAccount in exampleData. Otherwise, this should be ready for another review.

If there's a more concise way to handle the case where auth1 or auth2 is undefined, but not both, in authEquivalent, I'd like to know. The optional chaining operator would work for the apiKey, as in auth1?.apiKey === auth2?.apiKey but there isn't such an operator for avoiding a function call, as in the (dangerous without a check) keyOfAuth(auth1) === keyOfAuth(auth2).

In startEventPolling, there's no use delaying the response to a loop-breaking BAD_EVENT_QUEUE_ID with backoffMachine.wait(). So, move that wait so it happens after DEAD_QUEUE is dispatched.

This is a central function to determine that two Auth objects refer to the same account, to be used in the next commit.

This follows a discussion at and around https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/Event.20polling.20optimization.3F/near/810784. The `queueId !== getState().session.eventQueueId` is not the most direct way to check that the same user has remained logged in. So, replace that check and the `auth` check above it with a call to the new helper function authEquivalent, to check the maybe-stale auth against the current auth. Also, the `queueId !== ...` check allows a race condition! We change `session.eventQueueId` at `REALM_INIT`, which only happens after a `/register` request *completes*. So a user could switch accounts, triggering a new `/register` request, but the old account's event poll could return before that new `/register` completes, and the `queueId !== ...` will pass, triggering a state update with the old account's events instead of the new account's. Checking auth information is what we want here. It's true that this auth check will not catch the case that a queue was killed following a period of inactivity (i.e., the BAD_EVENT_QUEUE_ID). However, this is not a flaw, because the loop already handles that case correctly without the `queueId !== ...` check. In particular, if a queue is killed, api.pollForEvents will reject, and that rejection is handled in the catch block. When modifying startEventPolling to accept an auth parameter, I noticed (and added a comment to explain) that some actions being dispatched in doInitialFetch will throw unhandled errors if a user logs out while the /register request is in progress. This will be of concern when we work on zulip#3706.

chrisbobbe · 2020-04-29T04:28:12Z

Just fixed some merge conflicts and shortened some topic lines.

rk-for-zulip · 2020-05-02T00:33:42Z

src/events/eventActions.js

+    if (!authEquivalent(auth, tryGetAuth(getState()))) {
+      // The user logged out or switched accounts during progressiveTimeout
+      // called from see catch block, below. (If tryGetAuth returns undefined,
+      // the user is not logged in.)


This would make more sense immediately after the relevant await, rather than up here as far removed from it as possible. That wouldn't quite be [nfc] on its own, though, since it's entirely possible that this function is itself async-delayed (and whether or not it is is a bit fragile, anyway). You'd have to duplicate it to also occur before loop entry.

I wouldn't ordinarily suggest that... but given the duplicated parenthetical in the comment, perhaps this test should be pulled into a mini-closure anyway?

... and also used on catch entry. If a user has logged in to another account, you probably don't want to fire off the deadQueue() logic – or, worse yet, logout().

... and also used on catch entry.

Good catch! (No pun intended.)

And I agree these checks are best done immediately after the await.

Then one other point which can help keep things reasonably simple when checking this at the top of the catch: everything else in that try block doesn't need to be there, and can be moved to after the try .. catch. That error-handling code (including the backoff) only makes sense for an exception thrown inside the API call.

it's entirely possible that this function is itself async-delayed (and whether or not it is is a bit fragile, anyway). You'd have to duplicate it to also occur before loop entry.

I think the best fix for this is to move the tryGetAuth(getState()) from its caller down into the top of this function.

given the duplicated parenthetical in the comment

I think the parenthetical is best just deleted. Its content is in the summary line of the mentioned function's jsdoc anyway.

chrisbobbe · 2021-04-22T19:10:41Z

Closing my own PR, as it's gone stale and I'd like to approach this afresh with #4170 in mind.

chrisbobbe requested a review from gnprice January 22, 2020 02:05

chrisbobbe changed the title ~~Event polling optimization~~ Clarify startEventPolling, fix a race condition Jan 22, 2020

chrisbobbe force-pushed the event-polling-optimization branch 2 times, most recently from 4ed6526 to 7bd8fe0 Compare February 12, 2020 19:09

Chris Bobbe added 3 commits April 28, 2020 21:22

eventActions: Don't wait before handling dead queue error.

710c9f7

In startEventPolling, there's no use delaying the response to a loop-breaking BAD_EVENT_QUEUE_ID with backoffMachine.wait(). So, move that wait so it happens after DEAD_QUEUE is dispatched.

accountMisc: Add authEquivalent function.

82fc0ab

This is a central function to determine that two Auth objects refer to the same account, to be used in the next commit.

chrisbobbe force-pushed the event-polling-optimization branch from 7bd8fe0 to 0abc2cb Compare April 29, 2020 04:25

rk-for-zulip reviewed May 2, 2020

View reviewed changes

chrisbobbe closed this Apr 22, 2021

chrisbobbe deleted the event-polling-optimization branch November 5, 2021 00:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify startEventPolling, fix a race condition #3830

Clarify startEventPolling, fix a race condition #3830

chrisbobbe commented Jan 22, 2020

chrisbobbe commented Jan 22, 2020 •

edited

Loading

chrisbobbe commented Jan 22, 2020 •

edited

Loading

rk-for-zulip commented Jan 24, 2020

chrisbobbe commented Jan 24, 2020 •

edited

Loading

rk-for-zulip commented Jan 25, 2020

gnprice commented Jan 28, 2020 •

edited

Loading

gnprice commented Jan 28, 2020

chrisbobbe commented Jan 28, 2020 •

edited

Loading

chrisbobbe commented Jan 30, 2020

chrisbobbe commented Feb 12, 2020 •

edited

Loading

chrisbobbe commented Apr 29, 2020

rk-for-zulip May 2, 2020

rk-for-zulip May 2, 2020 •

edited

Loading

gnprice May 4, 2020

chrisbobbe commented Apr 22, 2021

Clarify startEventPolling, fix a race condition #3830

Clarify startEventPolling, fix a race condition #3830

Conversation

chrisbobbe commented Jan 22, 2020

chrisbobbe commented Jan 22, 2020 • edited Loading

chrisbobbe commented Jan 22, 2020 • edited Loading

rk-for-zulip commented Jan 24, 2020

chrisbobbe commented Jan 24, 2020 • edited Loading

rk-for-zulip commented Jan 25, 2020

gnprice commented Jan 28, 2020 • edited Loading

gnprice commented Jan 28, 2020

chrisbobbe commented Jan 28, 2020 • edited Loading

chrisbobbe commented Jan 30, 2020

chrisbobbe commented Feb 12, 2020 • edited Loading

chrisbobbe commented Apr 29, 2020

rk-for-zulip May 2, 2020

Choose a reason for hiding this comment

rk-for-zulip May 2, 2020 • edited Loading

Choose a reason for hiding this comment

gnprice May 4, 2020

Choose a reason for hiding this comment

chrisbobbe commented Apr 22, 2021

chrisbobbe commented Jan 22, 2020 •

edited

Loading

chrisbobbe commented Jan 22, 2020 •

edited

Loading

chrisbobbe commented Jan 24, 2020 •

edited

Loading

gnprice commented Jan 28, 2020 •

edited

Loading

chrisbobbe commented Jan 28, 2020 •

edited

Loading

chrisbobbe commented Feb 12, 2020 •

edited

Loading

rk-for-zulip May 2, 2020 •

edited

Loading