New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tighten up definition of and explain reasoning behind IDs #231
Comments
Specification change: wamp-proto/wamp-proto#231
We really should nail this in the spec text, as it creates confusion for implementors - crossbario/autobahn-js#377 |
Here is another one wondering about IDs: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/wampws/jQc-L1yrUU4/3GyqSqR4DAAJ There are a bunch of reasons and arguments that lead to the specific grouping of IDs and the use of different policies (or implementation freedom) for those ID groups. However, very little of that is in the spec text currently:( |
I was also wondering about the exact requirements towards the randomness of global ids. Assuming that there is a limited number of active session IDs, those can be easily generated and checked against the IDs of other currently open sessions to prevent duplicates. I posed the question to a good friend of mine and he came up with a mathematical construct for a pseudo-random sequence that, although ultimately predictable, would ensure uniqueness of the generated ids until all possible values have been used once. Now I wonder whether it satisfies all the requirements behind the specification. |
Hi there, thanks for the attention to detail! Let me answer with a couple of notes, maybe that is useful. So the spec says
This is mathematically well defined (sequence of stochastically independent random variables), could be approached by a true physical source of randomness, but practically all we need is "pseudo-randomness". pseudo-randomness over integers in [1, 2**53] will lead to repeated use of any given number in "some time". any use will lead to repeats because the ID set is finite. pseudo-randomness maintained in the sequence of IDs being used is however important as well. an external observer should not be able to easily guess or even compute the next ID that will be used, even when the observer has the complete list of IDs that had been used already. (maybe we should add that to the spec?) as I understand ^, the algo you propose gives up on that, in exchange for an additional property: all IDs from [1, 2**53] will be used exactly once before starting over again. this is nice, but the price is too high .. that is not what the spec was trying to hint at. it is not a PRNG in the sense that an external observer can easily compute the next ID that will be used just from looking at the last IDs that had been used. now, the limited ID range [1, 2**53] for publication IDs is of course sth to consider. fwiw, for eg event persistency, any situation where one wants to refer to events long-term should use a better ID. so what we often use is if that isn't enough, I would use this should give at least 150 bits ID space .. and that's enough;) it is many times the number of atoms in the known/observable universe. should be sufficient. |
Thank you very much for the quick follow-up and explanation! I was not aware whether predictability by an observer was actually an issue here and thought that avoidance of collision might be worth something, too. If the goal is to prevent an attacker from guessing the next publication id, then cryptographic secure randomness might be appropriate? And, more importantly, I'm still not sure whether repetitions in the sequence of generated random values might be an issue and what time should ideally elapse before any publication ID repeats. The system I'm currently working on produces millions of messages a day which are published on a WAMP-based API and it will only be a matter of time until publication ids will repeat - maybe not immediately after each other but in close succession at least and I wonder whether that might cause any issues down the line. |
Ok, right, maybe adding background text like the following would make sense to add to the spec as well: For a regular WAMP message broker, when a new event is published, that event is always immediately dispatched. WAMP provides to filter event dispatching via "exclude_authid" / "eligible_auth" (already a WAMP AP feature). Those are session IDs the publisher might want to exclude or permit as the only eligible receivers in the first place. If session IDs repeat (a new session comes along reusing an old session ID of a session no longer joined - otherwise it wouldn't be possible anyways), then the publisher might exclude the wrong session. You will note there is no publication ID in that feature at all. Now, there are 2 more WAMP AP features: "retained events" and "event history" With these features, a client can access events beyond "instant of time" (even publishing and dispatching always happens "now" .. an implicit reference point ... actually, the set of IDs in flight at that moment in time - the actively used IDs as of "now"). This means, with these features, a long-term stable / unique event ID is desirable. However, these features also expose the session ID of the publisher of the (historic) event. And hence, the trick I hinted above should be used by clients: using a synthetic event ID including session and publication ID: checkout |
thinking further about the situation: if you are concerned that when using a single, long-lived publisher, the pair couple of millions per day is peanuts;) in any case, practically, here is what I'd do if you're still concerned: just restart your publisher once a day. a reconnect is done in 100ms. The publisher will get a new session ID. Done. |
See also wamp-proto/wamp-proto#231 (comment) for further insight.
Personally I think that it should be fine, too - but I'm always a bit wary about human intuition about these things because randomness and statistics are notoriously hard to grasp (especially the afore-mentioned birthday problem) 😬 Nonetheless, thank you again for the thorough explanation of the intended implementation! I adapted mine to use a regular random generator now 🙂 |
It's not clear if IDs above "Session Scope" only need to be unique under a given realm, or if they need to be unique across a router hosting multiple realms.
Also, what is the rationale for |
Found this Google Groups thread and linking it here: why is Publication ID drawn randomly? I now understand that the rationale is for events to have a unique ID across multiple routers, and centrally keeping track of used sequential IDs would be prohibitively expensive. One can look at the probability table in the Wikipedia Birthday Problem article to get an idea of the probability of a publication ID collision, where the 48-bit row would be an under-estimate of 53 bits. There would need to be 1.1×1023 "active" events for there to be a 1 in 1 trillion probability of collision, assuming 48-bits.
I now understand this wouldn't make sense in a muli-router setup where multiple routers can be involved in the same realm.
I also now understand that Router Scope is for subscription and registration IDs that are presumably not shared with other routers in a multi-router setup. As I have no need for router clusters, my understanding of WAMP is limited to a single-router scenario, thus my previous confusion regarding the different ID scopes. |
Thing is, WAMP should support all of "single router (host)", "router library", "multi-node router", "federated router", .. From an app developer perspective, those are just infrastructure details that must not creep into app code. That's the original philosophy WAMP always aimed for. At least in my mind. So what is the bare minimum app developers should care about? Let me illustrate with a little story of 2 app developers, Alice and Bob. Say Alice is running a WAMP client with a session joined on realm "realm1" that registered a procedure "add2" to add two numbers and return the sum as a number. Alice tells Bob about her awesome new procedure "add2" available on realm "realm1". Alice also notes that the procedure expects two numbers (as positional arguments), and returns one positional result, the sum as a number, and that anyone on the realm is authorized to call it. Bob decides that he wants to give it a try and writes a WAMP client joining realm "realm1" to call Alice's procedure "add2" with two numbers, expecting the sum returned as a number. When Bob runs his client, it fails with one of these errors:
Why? Alice did tell Bob everything he should need to know as an application developer. What went wrong? |
Also: tplgy/bonefish#30
Changed text in bold:
These are identified in WAMP using IDs that are integers between
(inclusive) 1 and 2^53 (9007199254740992):
o IDs in the global scope MUST be drawn randomly from a uniform
distribution over the complete range [1, 2^53]
o IDs in the router scope can be chosen freely by the specific
router implementation from the range [1, 2^53]
o IDs in the session scope MUST be incremented by 1 beginning
with 1 (for each direction - Client-to-Router and Router-to-
Client)
Plus there should be an explanation of why forbid "0" also (like there is a section of why 2^53 as upper bound)
The text was updated successfully, but these errors were encountered: