TentD calculates post IDs like this:
rand(36 ** 6).to_s(36)
A base36-encoded number, 0 <= post id < 36**6.
I was curious about the possibility of collision. I found some Python code for the generalized birthday problem here. Using it, we can confirm the original birthday paradox, namely that if you have 23 people in a room there's ~50% chance two have the same birthday:
The collision results for post IDs from 0 to (36**6)-1 are not good:
Even at 10,000 posts, there's a 2.3% chance of collision. At 500,000 posts a collision is so nearly certain that Python rounds the result to 100%!
How feasible would it be to simply use a cryptographic hash as a post ID? Are you optimizing for something (human memorability? DB lookup speed?) by not doing that?
Otherwise, it seems that 14-character strings might be acceptable, or 10-character strings might work if the server explicitly checks for collisions and regenerates the ID until one is unique:
Thanks for pointing this out. The original decision for this was made quite a while ago, and the reasons for it are irrelevant now. Luckily, it's very easy to change.
What we need is a way to generate non-sequential unique identifiers. These should be as short as is reasonably possible, I think the easiest thing to do is to use a UUID/GUID, I don't have a strong preference which type though.
Also, whatever UUID system we end up with should use base32 or urlsafe base64 instead of hex, as there is no reason to waste bytes.
Generate longer ids
The IDs are now 16 bytes of random data Base64 encoded with url-safe characters (22 ASCII characters). They look like this: