Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymous TDs in a directory #149

Closed
farshidtz opened this issue Apr 9, 2021 · 13 comments
Closed

Anonymous TDs in a directory #149

farshidtz opened this issue Apr 9, 2021 · 13 comments
Labels

Comments

@farshidtz
Copy link
Member

Anonymous TD are those which don't have an id field.

Should the directory accept them? If yes, how should they be maintained and exposed anonymously?

@farshidtz
Copy link
Member Author

For the HTTP API:

  • Submission: With a POST request. System-generated pseudorandom (e.g. UUIDv4 URN) ID given to producer in response.
  • Retrieval: With the system-generated ID.
  • Listing (retrieval of all TDs): Returning anonymous TDs along with others. How to sort by ID (default) when anonymous entries have no IDs?
  • Search: Performing search on the full list. Same issue as for listing.
  • Notifications: Create/Update/Delete events need to provide the TD's identifier. Exposing system-generated IDs there suffers from the same issue associated with adding the ID inside the TD.

@egekorkan
Copy link

I think that the comments of the privacy group (PING) regarding having mandatory id in the previous versions of the TD would be applied here as well if an id is inserted by the system. Once a TD is in the directory and has an id, it will be probably always consumed with the version from the directory no?

@benfrancis
Copy link
Member

I nearly filed an issue about this a few weeks ago because I noticed the Directory Service API needs an ID to fetch a Thing Description, but id is not a mandatory member of a Thing Description. That could mean that when you fetch a list of Thing Descriptions from a directory, it isn't possible to determine the URL of an individual Thing Description in the directory if it doesn't provide an id member in its listing.

Then I noticed that the specification said the directory would provide a system-generated ID for "anonymous" Thing Descriptions, so that shouldn't be a problem (I assumed those IDs would also be provided as an id member in the directory listing).

Note that this does mean the Thing Description exposed by the directory wouldn't be exactly the same as the Thing Description provided by the client (as it would have an additional id member), which I personally think is fine, but some people have expressed a desire to add resource level signing to canonical Thing Descriptions which would be invalidated by such a modification.

FWIW, the way WebThings Gateway solves this problem is to add a href member to each Thing Description in the gateway's directory listing so that a client knows what URL to use to fetch an individual Thing Description (see https://webthings.io/api/#things-resource for an example). We later added an id member to all Thing Descriptions exposed by the gateway, which is always present (because at one point that was mandatory in the W3C spec) and is always set to the URL of the Thing Description itself. The href member therefore isn't strictly necessary any more, since the id member can be used instead. The unique part of these URIs is actually generated by adapter add-ons for different protocols, with a protocol-specific global identifier. E.g.

  1. For Zigbee devices it might look like https://foo.webthings.io/things/zb-d0cf5efdfe2cb1bb
  2. For native web things being proxied by the gateway, it includes the URL of the original Thing Description, e.g. https://foo.webthings.io/things/https---bar.example.com-things-lamp1

As an aside, the way I originally assumed directories would work was that web things would be added to the directory by URL (the URL at which their Thing Description is hosted), which could therefore always be used as a globally unique URI. In the current design things are registered with a directory using an entire Thing Description (with no reference to the URL at which that resource may have originally been served), which is then served as a new web resource by the directory at a new URL, effectively creating a new web thing. In this design it's necessary for the directory to generate IDs for Thing Descriptions which don't provide them, since they are needed for generating those new URLs.

@farshidtz
Copy link
Member Author

farshidtz commented Apr 9, 2021

I think that the comments of the privacy group (PING) regarding having mandatory id in the previous versions of the TD would be applied here as well if an id is inserted by the system. Once a TD is in the directory and has an id, it will be probably always consumed with the version from the directory no?

@egekorkan I couldn't understand. Would you elaborate more on this?


@benfrancis
The plan was to include the system-generated ID inside the registration object (wot-discovery/#example-td-registration-info); see #98 (comment). But this was later removed because using tdd:registration.id as TD's identifier wasn't semantically correct (@AndreaCimminoArriaga could explain better). It also has the issue mentioned above for default sorting.

I also do agree that setting id to the system-generated value is the cleanest and most common approach. To solve signing issues, I think the signature verification algorithm should provide a mechanism to support such additions; see w3c/wot-thing-description#940 (comment).

@egekorkan
Copy link

@farshidtz I am guessing you are asking about the bold part. So the thing is that the TDs are sent to the directory and after that, the directory would be mainly used to fetch and then consume the TDs. Thus, a TD Consumer would only see TDs with ids, which can easily bring the same comment from PING regarding TDs that always have ids. They did not like that and we had to cancel the Proposed Recommendation status and start from scratch with the publication process. I am pretty sure that nobody from the WG fondly remembers those months. Thus, to avoid such a setback once again, it would be wise to think of a way to avoid that. At least an informative note regarding privacy would be needed I think.

@AndreaCimminoArriaga
Copy link
Contributor

Sorry for joining late, let me share my thoughts. The TDs are JSON-LD framed documents. If they have an attribute id, which is translated to @id by the context, then in RDF the subject of the triples is the URL containing such id. However, when no id is provided (or @id) the subject of the triples is a blank node (as it happens for the nested JSON objects in the TD that do not specify an id or @id).

Therefore, adding an additional field that is not id or @id will only mess up the RDF since the triples generated will have a different subject (that is the one identifying this resource) that the string stored in this different and new field to identify the Thing; which in RDF could not be even a URL. The easiest solution is to store the anonymous identifier within the id or @id fields, to specify that such identifier is anonymous we could work on the syntax of the identifier. To showcase this, check the following examples:

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "id": "urn:dev:ops:32473-WoTLamp-1234", 
    "title": "MyLampThing",
    "securityDefinitions": {"nosec_sc": {"scheme": "nosec_sc"}},
    "security": ["nosec_sc"]
}
{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "title": "MyLampThing",
    "securityDefinitions": {"nosec_sc": {"scheme": "nosec_sc"}},
    "security": ["nosec_sc"]
}

The former is translated into the following RDF:

<urn:dev:ops:32473-WoTLamp-1234> <http://purl.org/dc/terms/title> "MyLampThing" .
<urn:dev:ops:32473-WoTLamp-1234> <https://www.w3.org/2019/wot/td#hasSecurityConfiguration> <https://json-ld.org/playground/nosec_sc> .
<urn:dev:ops:32473-WoTLamp-1234> <https://www.w3.org/2019/wot/td#securityDefinitions> _:b0 .
_:b0 <https://www.w3.org/2019/wot/td#scheme> "nosec_sc" .

It can be observed how the id is used as subject, which according to the RDF, is the URL that identifies this resource. Check also how the security definition (which is another resource) is translated into the blank node _:b0.

Instead, the latter TD is translated as follows:

_:b0 <http://purl.org/dc/terms/title> "MyLampThing" .
_:b0 <https://www.w3.org/2019/wot/td#hasSecurityConfiguration> <https://json-ld.org/playground/nosec_sc> .
_:b0 <https://www.w3.org/2019/wot/td#securityDefinitions> _:b1 .
_:b1 <https://www.w3.org/2019/wot/td#scheme> "nosec_sc" .

It can be observed that the subject of the TD, and therefore the URL that identifies it, this time is a blank node. The blank node is assigned by the translator itself. This mechanism is intrinsic to JSON-LD framed, but also, to regular JSON-LD; and this is totally correct.

Now, if we add a new attribute in the latter JSON-LD framed like "new_indetifier" with the value "urn:dev:ops:32473-WoTLamp-1234", the TD is translated into the following:

_:b0 <http://purl.org/dc/terms/title> "MyLampThing" .
_:b0 <https://www.w3.org/2019/wot/td#hasSecurityConfiguration> <https://json-ld.org/playground/nosec_sc> .
_:b0 <https://www.w3.org/2019/wot/td#new_indetifier> "urn:dev:ops:32473-WoTLamp-1234" .
_:b0 <https://www.w3.org/2019/wot/td#securityDefinitions> _:b1 .
_:b1 <https://www.w3.org/2019/wot/td#scheme> "nosec_sc" .

As it can be observed, the identifier is just another attribute, which is not identifying this resource in RDF since such task is done by the subject which is still a blank node.

This problem gets worse if this new field is located in a nested object under the key id or @id. I will put an example, assuming the following TD:

{
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "title": "MyLampThing",
  "tdd:registers" : {"id" : "urn:dev:ops:32473-WoTLamp-1234", "created" : "2020-01-01 00:00:00"},
    "securityDefinitions": {"nosec_sc": {"scheme": "nosec_sc"}},
    "security": ["nosec_sc"]
}

This TD, which in theory is an anon TD identified by urn:dev:ops:32473-WoTLamp-1234 through tdd:registers.id is translated into RDF as follows:

_:b0 <http://purl.org/dc/terms/title> "MyLampThing" .
_:b0 <https://www.w3.org/2019/wot/td#hasSecurityConfiguration> <https://json-ld.org/playground/nosec_sc> .
_:b0 <https://www.w3.org/2019/wot/td#securityDefinitions> _:b1 .
_:b1 <https://www.w3.org/2019/wot/td#scheme> "nosec_sc" .
_:b0 <https://www.w3.org/2021/wot/discovery#registers> <urn:dev:ops:32473-WoTLamp-1234> .
<urn:dev:ops:32473-WoTLamp-1234> <http://purl.org/dc/terms/created> "2020-01-01 00:00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .

As you can see, now the id that was supposed to identify the Thing resource is identifying another resource; which is the one used to store the registration information.

Therefore, in order to be consistent with the RDF translation. It seems more suitable to always use id or @id to identify TDs. If the TD is anon, then the structure of such identifier should maybe reflect that. On the other hand, when a TD is registered if it was anon the generated identifier is given back to the requester; and therefore is not unknown by them after the registration. Finally, even without the id the directory has discovery mechanisms enough to discover again the id in case of losing it.

@vcharpenay
Copy link
Contributor

vcharpenay commented Apr 12, 2021

An important point regarding identification in RDF:

  • id is not an identifier for the Thing Description document but an identifier of the Thing itself (the physical object, in most cases)
  • the TD document itself should be identified with a dereferenceable URL.

There is a common distinction on the Semantic Web between 'semantic resources' (the Thing itself, here) and 'informational resources', which are Web pages describing semantic resources (the TD document). See e.g. 'Distinguishing between Representations and Descriptions' from some W3C note. Schema.org makes this distinction by declaring a property schema:mainEntityOfPage between a schema:Thing and a schema:CreativeWork (a generic class encompassing Web pages).

What this distinction implies is the fact that 'anonymous TDs' are different from 'anonymous Things'. A TD is never anonymous because it must be identified by a dereferenceable URL. A Thing, however, may be identified in many different ways and wherever there is ambiguity or privacy concerns, its id could be substituted with a blank node.

In a Thing Directory, every TD must have a URL, for management purposes. I suggest to use this URL to unambiguously list TDs when searching on a Thing Directory. For instance, the output of search or listing could be:

[
  {
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "schema:mainEntityOfPage": "https://example.org/thing-directory/td/id01234",
    "properties" : {}
  },
  {
    "@context": "https://www.w3.org/2019/wot/td/v1",
    "schema:mainEntityOfPage": "https://example.org/thing-directory/td/id56789",
    "properties" : {}
  }
]

The returned TDs are identified as https://example.org/thing-directory/td/id01234 and https://example.org/thing-directory/td/id56789. A shorter alias for schema:mainEntityOfPage could be defined in the WoT discovery spec.

@farshidtz
Copy link
Member Author

farshidtz commented Apr 12, 2021

schema:mainEntityOfPage looks like a Self Link to me. We currently don't have that inside the TD, but it is used to interact with TDs.

E.g. the following known TD:

{
  "title": "Known TD",
  "id": "urn:dev:ops:32473-WoTLamp-1234",
  ...
}

Can be accessed at https://tdd.example.com/td/urn:dev:ops:32473-WoTLamp-1234

And an anonymous TD:

{
  "title": "Anonymous TD",
  ...
}

Can be accessed at https://tdd.example.com/td/urn:uuid:61079229-83ae-444a-9199-e9a98cda62b0, where urn:uuid:61079229-83ae-444a-9199-e9a98cda62b0 is a system-generated ID been given to the producer during submission.

If we add selfLinks to TDs, the list will look like:
(deliberately avoiding TD.links array for inserting selfLink)

[
  {
    "title": "Known TD",
    "id": "urn:dev:ops:32473-WoTLamp-1234",
    "selfLink": "https://tdd.example.com/td/urn:dev:ops:32473-WoTLamp-1234",
    ...
  },
  {
    "title": "Anonymous TD",
    "selfLink": "https://tdd.example.com/td/urn:uuid:61079229-83ae-444a-9199-e9a98cda62b0",
    ...
  }
]

This is similar to the use of href in Mozilla WebThings Gateway as explained above.

Adding selfLink:

@AndreaCimminoArriaga
Copy link
Contributor

This looks good to me

@vcharpenay
Copy link
Contributor

vcharpenay commented Apr 12, 2021

urn:dev:ops:32473-WoTLamp-1234 and https://tdd.example.com/td/urn:dev:ops:32473-WoTLamp-1234 are two different URIs. One is a URN, the other a dereferenceable URL.

The property schema:mainEntityOfPageis not the same as 'self'. It has been defined precisely to avoid confusion between a schema:Thing and a schema:WebPage. The city of New-York is not the same as the Wikipedia article about New-York. The former has geo-coordinates and a population size, the latter has an author and edit times. The same applies to Things and Thing Descriptions: the Thing is physically situated while a TD can be stored anywhere and, in particular, in a Thing Directory. A TD can also be edited, a Thing cannot.

In short, I find your example satisfactory up to the name of the property: selfLink. It doesn't capture the distinction between semantic and informational resources. That distinction is important on the Web. I hope you understand why.

I would be happier with something like describedBy (because what that JSON key points to is a Thing Description) or registeredAt.

@mmccool
Copy link
Contributor

mmccool commented Apr 12, 2021

Observations:

  1. TDs without are IDs are legal TDs
  2. It should be possible to store all legal TDs in a directory.
  3. Therefore directories need to assign ids to at least some TDs

My preferred solution is (still) to simply assign a local ID (only for that directory) for ALL TDs.
Because we return arrays and want to treat TDs as resources, this needs to be embedded inside the TD as "enriched" data.
When we specify signing, we can include a "chaining" label to make sure this additional data does not break the signature.

Proposal:

  1. Directory assigns a local id to all TDs.
  2. This id can be (optionally) embedded in an enriched TD just like other metadata
  3. API needs to allow for looking up TDs by local ID (in a URL)
  4. Signatures need to support chaining mechanism that omits enriched metadata

Does that work for everyone?

@farshidtz
Copy link
Member Author

farshidtz commented Apr 12, 2021

  1. Directory assigns a local id to all TDs.

I think mandating this leads to redundant "unique" identifiers which makes deployments hard to manage.

IMO, enforcing local system-generated IDs has the following issues:

  • The producer need to persist the ID to be able to update the resource. Searching to find the local ID before updating only works if there is another way to uniquely identify the TD. That can be the id field; as per JSON-LD specification: "@id: Used to uniquely identify things that are being described in the document with IRIs or blank node identifiers." So if there is an id in TD, it must be unique within the directory.
  • This may work on request/response exchanges, but I find it hard to imagine capturing an ID or searching before updating in pub/sub scenarios.
  • Storing an ID or finding it later means constrained IoT devices should too have persistency or be willing to perform two queries to update the TD or some values.

My proposal is to try to "continue" following REST principles:

  • Allow producers to submit an anonymous TD using a POST request (POST /td). Respond with a system-generated ID and insert it inside TD responses. This should not be the id field as @egekorkan and @vcharpenay mentioned. I like registeredAt as alias for schema:mainEntityOfPage.
  • Allow producers to submit known TDs using a PUT request with a local registrationID (PUT /td/{registrationID}) that is equal to TD.id. The same relation link as above shall be inserted inside the TD in responses.

I think the following addition to the spec is important:

  • Do not allow producers to submit known TDs using a PUT request with registrationID that is different from TD.id. Allowing that means we need to enforce uniqueness of two values within one or federated directories.

@mmccool
Copy link
Contributor

mmccool commented May 17, 2021

Consider this resolved now.

@mmccool mmccool closed this as completed May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants