Tighten up definition of and explain reasoning behind IDs #231

oberstet · 2016-01-06T20:03:18Z

Changed text in bold:

These are identified in WAMP using IDs that are integers between
(inclusive) 1 and 2^53 (9007199254740992):

o IDs in the global scope MUST be drawn randomly from a uniform
distribution over the complete range [1, 2^53]

o IDs in the router scope can be chosen freely by the specific
router implementation from the range [1, 2^53]

o IDs in the session scope MUST be incremented by 1 beginning
with 1 (for each direction - Client-to-Router and Router-to-
Client)

Plus there should be an explanation of why forbid "0" also (like there is a section of why 2^53 as upper bound)

See wamp-proto/wamp-proto#231

Specification change: wamp-proto/wamp-proto#231

Fix issue #231

Fix issue wamp-proto#231

oberstet · 2018-09-14T06:10:52Z

We really should nail this in the spec text, as it creates confusion for implementors - crossbario/autobahn-js#377

oberstet · 2019-03-17T09:25:40Z

Here is another one wondering about IDs: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/wampws/jQc-L1yrUU4/3GyqSqR4DAAJ

There are a bunch of reasons and arguments that lead to the specific grouping of IDs and the use of different policies (or implementation freedom) for those ID groups. However, very little of that is in the spec text currently:(

milgner · 2021-03-15T20:14:31Z

I was also wondering about the exact requirements towards the randomness of global ids. Assuming that there is a limited number of active session IDs, those can be easily generated and checked against the IDs of other currently open sessions to prevent duplicates.
However, with publication IDs, given only a relatively small space of 2^53, the likelihood of repetitions increases rapidly due to the birthday problem.
It would be good to know over what period of time the publication id is supposed to be non-repeating.

I posed the question to a good friend of mine and he came up with a mathematical construct for a pseudo-random sequence that, although ultimately predictable, would ensure uniqueness of the generated ids until all possible values have been used once.
See my Java-based implementation here.

Now I wonder whether it satisfies all the requirements behind the specification.

oberstet · 2021-03-15T20:47:56Z

Hi there, thanks for the attention to detail! Let me answer with a couple of notes, maybe that is useful.

So the spec says

drawn randomly from a uniform distribution

This is mathematically well defined (sequence of stochastically independent random variables), could be approached by a true physical source of randomness, but practically all we need is "pseudo-randomness".

pseudo-randomness over integers in [1, 2**53] will lead to repeated use of any given number in "some time". any use will lead to repeats because the ID set is finite.

pseudo-randomness maintained in the sequence of IDs being used is however important as well.

an external observer should not be able to easily guess or even compute the next ID that will be used, even when the observer has the complete list of IDs that had been used already. (maybe we should add that to the spec?)

as I understand ^, the algo you propose gives up on that, in exchange for an additional property: all IDs from [1, 2**53] will be used exactly once before starting over again.

this is nice, but the price is too high .. that is not what the spec was trying to hint at. it is not a PRNG in the sense that an external observer can easily compute the next ID that will be used just from looking at the last IDs that had been used.

now, the limited ID range [1, 2**53] for publication IDs is of course sth to consider. fwiw, for eg event persistency, any situation where one wants to refer to events long-term should use a better ID.

so what we often use is (publisher_session_id, published_event_id) which gives you 2x53 bits ID space.

if that isn't enough, I would use (publisher_session_join_timestamp, publisher_session_id, published_event_id)

this should give at least 150 bits ID space .. and that's enough;) it is many times the number of atoms in the known/observable universe. should be sufficient.

milgner · 2021-03-15T21:00:36Z

Thank you very much for the quick follow-up and explanation! I was not aware whether predictability by an observer was actually an issue here and thought that avoidance of collision might be worth something, too. If the goal is to prevent an attacker from guessing the next publication id, then cryptographic secure randomness might be appropriate?
👍 for adding that to the spec.

And, more importantly, I'm still not sure whether repetitions in the sequence of generated random values might be an issue and what time should ideally elapse before any publication ID repeats.

The system I'm currently working on produces millions of messages a day which are published on a WAMP-based API and it will only be a matter of time until publication ids will repeat - maybe not immediately after each other but in close succession at least and I wonder whether that might cause any issues down the line.

oberstet · 2021-03-15T21:09:49Z

And, more importantly, I'm still not sure whether repetitions in the sequence of generated random values might be an issue and what time should ideally elapse before any publication ID repeats.

Ok, right, maybe adding background text like the following would make sense to add to the spec as well:

For a regular WAMP message broker, when a new event is published, that event is always immediately dispatched.

WAMP provides to filter event dispatching via "exclude_authid" / "eligible_auth" (already a WAMP AP feature).

Those are session IDs the publisher might want to exclude or permit as the only eligible receivers in the first place.

If session IDs repeat (a new session comes along reusing an old session ID of a session no longer joined - otherwise it wouldn't be possible anyways), then the publisher might exclude the wrong session.

You will note there is no publication ID in that feature at all.

Now, there are 2 more WAMP AP features: "retained events" and "event history"

With these features, a client can access events beyond "instant of time" (even publishing and dispatching always happens "now" .. an implicit reference point ... actually, the set of IDs in flight at that moment in time - the actively used IDs as of "now").

This means, with these features, a long-term stable / unique event ID is desirable.

However, these features also expose the session ID of the publisher of the (historic) event.

And hence, the trick I hinted above should be used by clients: using a synthetic event ID including session and publication ID: checkout EVENT.Details.publisher

oberstet · 2021-03-15T21:30:29Z

thinking further about the situation: if you are concerned that when using a single, long-lived publisher, the pair (publisher_session_id, publication_id) would roll over time, the relevant number to know is: events published per time by that single publisher.

couple of millions per day is peanuts;) 2**53 isn't huge, but also not exactly small. obviously, 2**53/10**6 is still big.

in any case, practically, here is what I'd do if you're still concerned: just restart your publisher once a day. a reconnect is done in 100ms. The publisher will get a new session ID. Done.

See also wamp-proto/wamp-proto#231 (comment) for further insight.

milgner · 2021-03-15T21:51:55Z

Personally I think that it should be fine, too - but I'm always a bit wary about human intuition about these things because randomness and statistics are notoriously hard to grasp (especially the afore-mentioned birthday problem) 😬
Given the implementations I saw so far, I don't even think that repetitions in the ID would cause any troubles. The only scenario where it could be would be related to a CALL / RESULT ID - but these are in the session scope and sequential anyway so there's nothing to worry about.

Nonetheless, thank you again for the thorough explanation of the intended implementation! I adapted mine to use a regular random generator now 🙂

ecorm · 2023-01-03T01:06:55Z

It's not clear if IDs above "Session Scope" only need to be unique under a given realm, or if they need to be unique across a router hosting multiple realms.

~~Given "WAMP messages are only routed within a Realm" (https://wamp-proto.org/wamp_latest_ietf.html#section-1.3.1-1), shouldn't "Router Scope" be better named as "Realm Scope"?~~ Nevemind, I realize now this doesn't make sense.

Also, what is the rationale for PUBLISHED.Publication and EVENT.Publication needing to be random? For session ID, there is this guide which suggests a cryptographically secure PRNG and 128 bits. But wouldn't sequential publication IDs be better for handling duplicates when used in combination with Event History?

ecorm · 2023-01-03T01:44:53Z

Also, what is the rationale for PUBLISHED.Publication and EVENT.Publication needing to be random?

Found this Google Groups thread and linking it here: why is Publication ID drawn randomly?

I now understand that the rationale is for events to have a unique ID across multiple routers, and centrally keeping track of used sequential IDs would be prohibitively expensive. One can look at the probability table in the Wikipedia Birthday Problem article to get an idea of the probability of a publication ID collision, where the 48-bit row would be an under-estimate of 53 bits. There would need to be 1.1×10²³ "active" events for there to be a 1 in 1 trillion probability of collision, assuming 48-bits.

"Router Scope" be better named as "Realm Scope"

I now understand this wouldn't make sense in a muli-router setup where multiple routers can be involved in the same realm.

It's not clear if IDs above "Session Scope" only need to be unique under a given realm, or if they need to be unique across a router hosting multiple realms.

I also now understand that Router Scope is for subscription and registration IDs that are presumably not shared with other routers in a multi-router setup.

As I have no need for router clusters, my understanding of WAMP is limited to a single-router scenario, thus my previous confusion regarding the different ID scopes.

oberstet · 2023-01-03T05:49:40Z

As I have no need for router clusters, my understanding of WAMP is limited to a single-router scenario, thus my previous confusion regarding the different ID scopes.

Thing is, WAMP should support all of "single router (host)", "router library", "multi-node router", "federated router", ..

From an app developer perspective, those are just infrastructure details that must not creep into app code.

That's the original philosophy WAMP always aimed for. At least in my mind.

So what is the bare minimum app developers should care about?

Let me illustrate with a little story of 2 app developers, Alice and Bob.

Say Alice is running a WAMP client with a session joined on realm "realm1" that registered a procedure "add2" to add two numbers and return the sum as a number.

Alice tells Bob about her awesome new procedure "add2" available on realm "realm1". Alice also notes that the procedure expects two numbers (as positional arguments), and returns one positional result, the sum as a number, and that anyone on the realm is authorized to call it.

Bob decides that he wants to give it a try and writes a WAMP client joining realm "realm1" to call Alice's procedure "add2" with two numbers, expecting the sum returned as a number.

When Bob runs his client, it fails with one of these errors:

"no such realm 'realm1'"
"no such procedure 'add2'"
"not authorized to call procedure 'add2'"
"type error for call args"
"type error for call result"

Why? Alice did tell Bob everything he should need to know as an application developer. What went wrong?

oberstet added Basic Profile Bug RFC labels Jan 6, 2016

oberstet mentioned this issue Jan 6, 2016

Adjusting WAMP Sequential ID Range for Autobahn Example tplgy/bonefish#30

Closed

pangiole added a commit to pangiole/akka-wamp that referenced this issue Jul 17, 2016

Tighten up definition of IDs

dc878a4

See wamp-proto/wamp-proto#231

pangiole mentioned this issue Jun 22, 2017

Fix issue #231 #286

Merged

ralscha added a commit to ralscha/wamp2spring that referenced this issue Aug 22, 2017

ID must be greater than 0

e318700

Specification change: wamp-proto/wamp-proto#231

oberstet added a commit that referenced this issue Feb 12, 2018

Merge pull request #286 from angiolep/fix-270

4fe6708

Fix issue #231

oberstet added this to the spec-fixes-and-polish milestone Feb 21, 2018

oberstet mentioned this issue Feb 21, 2018

Inconsistent definition of IDs #270

Open

ecorm pushed a commit to ecorm/wamp-proto that referenced this issue Mar 23, 2018

Merge pull request wamp-proto#286 from angiolep/fix-270

eeaeabd

Fix issue wamp-proto#231

oberstet mentioned this issue Sep 14, 2018

Use incrementing IDs for session requests crossbario/autobahn-js#377

Merged

oberstet changed the title ~~Tighten up definition of IDs~~ Tighten up definition of and explain reasoning behind IDs Mar 17, 2019

milgner added a commit to i22-digitalagentur/vertx-wamp that referenced this issue Mar 15, 2021

refactor: use regular random numbers for publication ids

2527ffa

See also wamp-proto/wamp-proto#231 (comment) for further insight.

ecorm mentioned this issue Jan 3, 2023

Define and discuss "global scope" in spec #429

Closed

oberstet mentioned this issue Jan 5, 2023

Define "the router" #433

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten up definition of and explain reasoning behind IDs #231

Tighten up definition of and explain reasoning behind IDs #231

oberstet commented Jan 6, 2016

oberstet commented Sep 14, 2018

oberstet commented Mar 17, 2019

milgner commented Mar 15, 2021

oberstet commented Mar 15, 2021 •

edited

milgner commented Mar 15, 2021

oberstet commented Mar 15, 2021 •

edited

oberstet commented Mar 15, 2021 •

edited

milgner commented Mar 15, 2021

ecorm commented Jan 3, 2023 •

edited

ecorm commented Jan 3, 2023 •

edited

oberstet commented Jan 3, 2023

Tighten up definition of and explain reasoning behind IDs #231

Tighten up definition of and explain reasoning behind IDs #231

Comments

oberstet commented Jan 6, 2016

oberstet commented Sep 14, 2018

oberstet commented Mar 17, 2019

milgner commented Mar 15, 2021

oberstet commented Mar 15, 2021 • edited

milgner commented Mar 15, 2021

oberstet commented Mar 15, 2021 • edited

oberstet commented Mar 15, 2021 • edited

milgner commented Mar 15, 2021

ecorm commented Jan 3, 2023 • edited

ecorm commented Jan 3, 2023 • edited

oberstet commented Jan 3, 2023

oberstet commented Mar 15, 2021 •

edited

oberstet commented Mar 15, 2021 •

edited

oberstet commented Mar 15, 2021 •

edited

ecorm commented Jan 3, 2023 •

edited

ecorm commented Jan 3, 2023 •

edited