Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Very Small) Chance of Out-of-Order IDs May Cause Problems #5228

Closed
aschmitz opened this issue Oct 5, 2017 · 7 comments
Closed

(Very Small) Chance of Out-of-Order IDs May Cause Problems #5228

aschmitz opened this issue Oct 5, 2017 · 7 comments

Comments

@aschmitz
Copy link
Contributor

aschmitz commented Oct 5, 2017

I'm filing an issue to take this out of a Mastodon conversation and have a place to discuss the actual issue.

Brief background: Status IDs (since #4801) contain a timestamp, plus 16 bits of a counter with a random base. Because this is a counter mod 2**16, if multiple statuses occur in the same millisecond, there is a small chance of the counter wrapping, and a status occurring "out of order" where a client that requests statuses during that millisecond may not see the newer status, despite it having a lower ID.

This is extraordinarily unlikely in practice. Some quick math (throw it in https://sagecell.sagemath.org/ if you want to change numbers):

# Borrowed from https://gist.github.com/noskill/8640130
def poisson(lmbda, x):
    return e ** (-lmbda) * lmbda ** x / factorial(x)

statuses = 1000000
time_in_ms = 1000 * 60 * 60 * 24 * 30

rate = statuses / time_in_ms

at_most_one = poisson(rate, 0) + poisson(rate, 1)

two_in_same_ms_odds = 1 - at_most_one

two_in_same_ms_hours = (1 / two_in_same_ms_odds) / 1000 / 60 / 60

print "Hours between two statuses in the same millisecond: %f" % two_in_same_ms_hours

max_count = 100
covered = two_in_same_ms_odds
freq_problems = 0
for count in range(2, max_count):
    rate_at_count = poisson(rate, count)
    covered += rate_at_count
    # For our purposes, the "starting" low 16 bits are random,
    # regardless of method. If we have 2 statuses in the same ms
    # there is one "bad" starting value (2**16-1). If there are
    # 3 in the same ms, there are 2 bad starting values, etc.
    odds_of_problem = (count - 1) / 2**16
    freq_problems += rate_at_count * odds_of_problem

problems_ms = 1 / freq_problems
problems_days = problems_ms / 1000 / 60 / 60 / 24
problems_years = problems_days / 365.25

print "Years between wrapped status IDs in same millisecond: %f" % problems_years
Hours between two statuses in the same millisecond: 3.733440
Years between wrapped status IDs in same millisecond: 27.908198

Note that there are some really wildly incorrect assumptions here: you see a million statuses per 30 days in your timeline (which is pretty ridiculously high), and they are uniformly distributed over the day (which is definitely not the case, but possibly offsets the high number of statuses? It makes the math much easier).

Under current assumptions, if operations take no time (worst case), and you never refresh the whole feed, and you only request statuses with IDs higher than the latest one you've seen, you'll miss one every 27.9 years, which isn't too bad. (Note that if it would notify you because you were tagged in it, you'll see it anyway.)

I see a few options, and would be open to more:

  1. Do nothing. Just ignore the issue and hope it doesn't seriously bite anyone. This isn't an option I like, but I'll throw it out just in case anyone really dislikes the others.
  2. Reset the counter to zero every millisecond. This is practically intractable, because it would add a pretty massive amount of overhead to something centralized (probably the database, but possibly a round-trip to Redis instead... which may well take more than a millisecond anyway, if Redis is on a different machine than the app server).
  3. Give all statuses that are in the included milliseconds that the user requests (that is, if a user requests a max_id of 123456789, instead return all statuses with IDs up to 123456789 | 0xffff, to include any statuses that could have occurred in that millisecond). This is easy enough to implement, but requires clients to deduplicate statuses they receive.
  4. Simply don't return statuses from the most recent millisecond when users request them. This artificially delays statuses, but only a very tiny amount, with network latency almost certainly dominating any calculations here. This may be the least impactful option, although I'm not 100% sure how to do it off the top of my head.
@nightpool
Copy link
Member

nightpool commented Oct 5, 2017 via email

@nightpool
Copy link
Member

nightpool commented Oct 5, 2017 via email

@clworld
Copy link
Contributor

clworld commented Oct 5, 2017

I found out-of-order arrival of status id is already happen with non-snowflake sequential id.
That situation seems more common than within millisecond statuses arrival.
test
Without fixing this, I prefer 1 because some statuses are already lost via REST API.

@Gargron
Copy link
Member

Gargron commented Oct 5, 2017

I have a few general comments about the snowflake IDs, and want to put them down here. Maybe I merged too soon - but I was getting impatient and didn't want to hold up a new contributor too long either.

  • the timestamp_id() function in Postgres uses the vanilla now() epoch. But we don't need to store any dates prior to when Status Net was first created! I think that's 2010 but just to be sure we could take 2009. Going forward that would allow us a lot more storage capacity.
  • I think the md5 salt in the sequence in the timestamp_id() function is over-engineered and unnecessary. It solves a problem I do not consider to be a problem. Using md5 on every id generation is surely not good for performance either, even though md5 is the weakest hashing function.
  • I think using the full sequence number rather than only the last two bytes is okay. Like this solution, we can even modulo that sequence to 1024 or some limited number like that. Because timestamp_id() is only ever called for millisecond-precision timestamps, this means we would have a throughput of 1024 (or whatever) new IDs per millisecond, which I think is enough. Also, when the modulo spills over, Twitter's snowflake implementation for example simply waits for the next millisecond to tick over so you can start from scratch.

Flickr's snowflake generator implementation: https://github.com/formspring/flake/blob/master/flake.py
Twitter's snowflake generator implementation: https://github.com/twitter/snowflake/blob/b3f6a3c6ca8e1b6847baa6ff42bf72201e2c2231/src/main/scala/com/twitter/service/snowflake/IdWorker.scala

@aschmitz
Copy link
Contributor Author

aschmitz commented Oct 5, 2017

@nightpool I think we'd only have the "rounded off milliseconds" problem with backdated statuses, in which case they'd be inserted "out of order" anyway, so I'm not immediately concerned about it. We could (and probably should?) pick a random millisecond if the milliseconds on the timestamp are 0, but I don't think it's a major concern in terms of ordering, because they'll already be behind the current IDs (at least after #5211 gets merged).

@clworld Have you seen this happening in practice? I certainly believe that it could happen, but I don't know how common it is. If it's already common enough, I'd just stick with option 1 and say "yeah, it could be unfortunate, but it's so rare that you'll drop statuses for other reasons anyway a lot more often".

@Gargron In order:

  1. Let's assume we can only go up to 63 bits because somehow we're using signed 64-bit ints (which I think we are, in Postgres). This gives (2**(63-16)/1000/60/60/24/365) ~= 4662 years of IDs without running out. Sure, we could save a whole 30 years by using a nonstandard epoch, but I figured I'd just go with the Unix epoch for consistency.

  2. To an extent, it solves a problem I'd like to avoid, but I guess for some purposes it could feel over-engineered. I benchmarked it with Postgres on a single core of a midrange CPU and didn't find it to take significant amounts of time compared to other status creation actions. (In terms of MD5 being weak, I agree, but it's reasonably fast and included in all supported versions of Postgres and has no publicly known preimage attacks, making it secure enough for the intended purpose.)

  3. I'm not entirely sure what you're saying here. If you're suggesting a sequence number within the millisecond, Postgres provides no way to do this (without storing it in a temporary table at significant levels of overhead), and doing it in the app servers poses a different distribution problem (that would be a lot more difficult to wrangle). Otherwise, you'll still have something that wraps at some point when you take it modulo some number. (I'd actually like the idea of a sequence number that restarts every millisecond on the millisecond, but I can't find a good way to do it.)

@Gargron
Copy link
Member

Gargron commented Oct 5, 2017

@aschmitz

Postgres provides no way to do this (without storing it in a temporary table at significant levels of overhead)

But isn't that what nextval() does? Take nextval(), like you already do, modulo 1024, if the result is 0 it means we wrapped around -> wait a millisecond and try again. Shouldn't that be possible?

@aschmitz
Copy link
Contributor Author

aschmitz commented Oct 6, 2017

@Gargron Ah, I see what you're saying. You'd just wait if we wrapped, not reset at 0 each millisecond. That's not compatible with what I'm looking for (it easily reveals whether or not two statuses with nearby (but not identical) timestamps are sequential), but I can stick with my generator on my instance and we can swap out a different one for general use if you feel it's necessary, so it doesn't bother me all that much (though I'd like to get the marginal benefits to everyone else, it's better than nothing, particularly on many-user instances).

I can try to look into something like that this weekend if necessary. (In particular, the waiting for fractional milliseconds part will be interesting.) I don't actually see any significant advantage to doing that rather than what we have now, but if you'd rather not live without it, I can look into what'd be required.

If there were an atomic check-and-set in Postgres, this would be trivial, but alas I'm not aware of any.

@Gargron Gargron closed this as completed Apr 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants