Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selective disclosure of identified nodes #107

Closed
dwaite opened this issue Dec 16, 2023 · 10 comments
Closed

Selective disclosure of identified nodes #107

dwaite opened this issue Dec 16, 2023 · 10 comments

Comments

@dwaite
Copy link

dwaite commented Dec 16, 2023

I notice that the examples within the document do selective disclosure on subjects without identifiers, e.g. they will canonicalize to have blank node identifiers.

I did not see a section that explains the limitations on identified RDF subjects, e.g. a credentialSubject with a DID. It did not appear that the transformation step affected node names but only positions, which would mean that the identifier for a RDF subject must be revealed if the subject is revealed within the graph.

Am I missing a technical solution to this? Is there an expectation that credentials which use VC-DI-BBS would balance their use of graphs with the correlation and disclosure which might come from releasing identified RDF subjects?

@Wind4Greg
Copy link
Collaborator

Hi David (@dwaite) the new Privacy Considerations hits the two main issues of "selective disclosure and data leakage" and "selective disclosure and unlinkability". I'm assuming you are concerned with the unlinkability aspect here?

The short answer is that any id fields are preserved if present in the document. The long answer is given below.

In the unlinkability analysis I broke the discussion in "layers" and looked at how "artifacts" from each layer affect the size of the anonymity set. A unique global id on credentialSubject field if disclosed would reduce the anonymity set to one if revealed by the holder.

Both issuer and holder have responsibilities in promoting unlinkability hence sections Linkage via Proof Options and Mandatory Reveal and Linkage via Holder Selective Reveal.

Is this getting at your question?

@dwaite
Copy link
Author

dwaite commented Dec 16, 2023

I think the question is more whether there is a need someplace for considerations about having an id on credentialSubject.

If say your identity as a did (or a member of a windsurfing association) was in the base credential, such as by adding

"id": "did:example:12345",

the RDF N-quads are now named by that, rather than having a blank node name

<did:example:12345> <https://windsurf.grotto-networking.com/selective#sails> _:b1 .
<did:example:12345> <https://windsurf.grotto-networking.com/selective#sails> _:b2 .
<did:example:12345> <https://windsurf.grotto-networking.com/selective#sails> _:b3 .
<did:example:12345> <https://windsurf.grotto-networking.com/selective#sails> _:b4 .

I believe this would mean that the only way to know the relationship within a credential of this information (that it is related to the credential subject) will now always be to disclose your identity. The credential can't just be that some holder surfed Lahaina in 2023, it now also discloses your correlatable DID. There didn't seem to be a step that e.g. 'blankifies' RDF subjects by adding supplemental entries to a transformed n-quad list.

So I think this becomes an influencer in schema design; you would want to consider limiting your semantic modeling. In particular, you wouldn't want to indicate e.g. participation in a particular event by the event's IRI being a RDF subject, unless the intention was to disclose that id as part of any disclosure of that subset of the credential graph.

The most significant impact though is likely credentialSubject, where an issuer's unique identifier or subject-provided DID should be referenced by a predicate rather than be the id of the subject, assuming you do not want all significant information disclosures to also identify the subject.

@Wind4Greg
Copy link
Collaborator

Hmm, it seems to me that any id used in a sub-object also has this issue. I.e., the id will show up in the statements produced and hence to avoid tracking the holder could not disclose any portion of that sub-object. In the case of the credentialSubject having a unique id field this breaks unlinkability for the entire credential and should not be used?

@dlongley am I interpreting this correctly? In my write up on privacy considerations I basically assumed that to support unlinkability that the issuer shouldn't put any type of unique id anywhere in the credential. Is there something more subtle that should be said?

@dlongley
Copy link
Contributor

@Wind4Greg,

...am I interpreting this correctly? In my write up on privacy considerations I basically assumed that to support unlinkability that the issuer shouldn't put any type of unique id anywhere in the credential. Is there something more subtle that should be said?

Unique (globally unambiguous) IDs can't be unlinkably revealed, full stop. The whole point of them is unambiguous identification (for correlation). Perhaps we do need a sentence or two about this so it's as clear as possible to issuers and holders -- even though it's obvious from the "globally unambiguous" or "unique" property of those IDs.

But I understand that this issue is more specifically about the fact that if such an ID is used to identify an object in the graph, then that ID will be reused to express all signed statements about it. The ID will not be "automatically blankified". Notably, making IDs "blank" can also decrease unlinkability -- if you have sufficiently many of them (or if other patterns arise from various artifacts) -- as we have already discussed in the spec.

Enabling useful unlinkability is hard. In fact, "automatically blankifying" all IDs in an effort to help the VC designer avoid thinking about unlinkability might actually result in less unlinkability in practice.

I think the core consideration here is around encouraging VC designers and issuers to think about whether users (with possible assistance from their digital wallet) will understand if they can reveal certain information in a VC in an unlinkable way. I think it's better to tell people to design VCs such that unique IDs either aren't present at all or that they are only present in special areas that users could be expected to understand what will happen if they are revealed.

For example, revealing a particular area of the VC, such as a confidence method to facilitate proving control over a DID, is probably not expected to result an unlinkable presentation. This fits in nicely with the concept that a holder can reveal an attribute or two about "someone" to a verifier without providing any confidence about who they specifically are -- unless, of course, they want to. The VC can be deliberately modeled that way from the start by having it say "these are the attributes of 'someone'" (using a blank node ID). But if the issuer talks about too many "someones" in the same VC, well, then it will start creating a unique fingerprint, just like if a holder reveals too many attributes to the same verifier.

As another example for VC designers, it's ok (for revealing holders) if unique IDs are present in statements that will be common in VCs held a sufficiently large group of people (e.g., a unique ID for "The United States" as a country of residence).

In short, given how challenging it is to ensure unlinkability (at all layers), I don't think we can eliminate having the VC designer and issuer be mindful in what they are doing. We don't want to eliminate their ability to use idiomatic JSON structures, but I also don't think we can completely abstract away the unlinkability considerations. The VC designer must understand some basic principles and things to avoid. We want them (and issuers) to, fundamentally, be thinking about expressing attributes about "someone" -- not a specific person.

Perhaps some more words about the above "enabling unlinkability requires VC designers to think about user / holder expectations" could be added in some way to the spec.

Separately, and I don't know if we need to say more about this or not, but ensuring that IDs are consistent and present in each statement is important to avoid statement recombination attacks. This is automatically done by the approach here -- but getting that semantic certainty (avoiding the confusion) has the above trade off around ensuring unlinkable presentation (for common use cases) is possible. I think that's the best trade off (it is helpful to both VC designers and verifiers) since enabling unlinkability already requires care at every layer.

@Wind4Greg
Copy link
Collaborator

Hmm, I'm thinking about some additional explanatory text in the section on unlinkability. I'm not a JSON-LD expert but here is how I might try to explain things.

Unlinkability and JSON-LD

JSON-LD is a JSON-based format to serialize Linked Data. As such it supports assigning each object (node in JSON-LD terminology) within a document a globally unique @id attribute (node identifier). This allows for "the linking of linked data". When using BBS for its unlinkability property such unique node identifiers cannot be used since they are intended to provide strong linkage which is just the thing we are trying to avoid. Note that such @id are different from other attributes that may appear in a document which a holder may choose not to disclose. Also JSON-LD's use of @context which maps terms to IRI does not, in general, affect unlinkability.

@dwaite would this be good to add to clear things up? @dlongley is this accurate? Wording/explanation improvements?

@dlongley
Copy link
Contributor

@Wind4Greg,

When using BBS for its unlinkability property such unique node identifiers cannot be used since they are intended to provide strong linkage which is just the thing we are trying to avoid.

Hmm, well, this isn't true. We just don't want to use them to identify objects with PII. It's totally fine to use a globally unambiguous ID for say a country, state, act of congress, general type information, this sort of thing. In fact, it's important to be able to do so -- that kind of linkage is usually vital in many use cases. We just don't want individuals to be globally unambiguous in that way. So I think we need to be more nuanced here -- perhaps that "you cannot use globally unambiguous identifiers for personal information" (and also get unlinkability).

@Wind4Greg
Copy link
Collaborator

@dlongley when I look at the JSON-LD spec I see at least two different uses of IRIs for "terms" and "node ids". From the spec:

  1. IRIs (Internationalized Resource Identifiers [RFC3987]) are fundamental to Linked Data as that is how most nodes and properties are identified.
  2. "Simply speaking, a context is used to map terms to IRIs. I take it that properties are "identified" via "terms" here.
  3. To be able to externally reference nodes in an RDF graph, it is important that nodes have an identifier. IRIs are a fundamental concept of Linked Data, for nodes to be truly linked, dereferencing the identifier should result in a representation of that node.

Should we be cautioning about IRIs in general? But that would include references to "terms" which don't seem to hurt unlinkability (too much). Can we restrict the discussion to node ids since those can't be easily omitted via selective disclosure, or just a particular set of node ids? @dlongley can you supply some text for those of us who aren't JSON-LD experts ;-)

@dlongley
Copy link
Contributor

dlongley commented Dec 19, 2023

@Wind4Greg,

Here's are some tweaks on the language you had above, what do you think?:

JSON-LD is a JSON-based format to serialize Linked Data. As such it supports assigning each object (node in JSON-LD terminology) within a document a globally unambiguous @id attribute (node identifier). This allows for "the linking of linked data", enabling information about the same entity to be correlated. This correlation can be desirable or undesirable, depending on the use case. When using BBS for its unlinkability property, globally unambiguous node identifiers cannot be used for individuals or for their personally identifiable information since the strong linkage they provide is undesirable. Note that the usage of such identifiers is acceptable to express statements about non-personal information (e.g., using a globally unambiguous identifier to identify a large country or a concert event). Also, JSON-LD's use of @context which maps terms to IRIs does not, in general, affect unlinkability.

@Wind4Greg
Copy link
Collaborator

@dwaite I'm thinking about a PR with the text above from @dlongley in a new section between the currently sections on privacy/unlinkability: 5.2.2 Linkage via VC Processing and 5.2.3 Linkage via Proof Options and Mandatory Reveal with a tentative title of Linkage via JSON-LD Node Identifiers.

Would that clarify things for the issuer without being as draconian as I was inclined to be or folks might be inclined to be? Other ideas for section title or placement?

@msporny
Copy link
Member

msporny commented Feb 9, 2024

PR #109 was raised to address this issue and has since been merged. Closing.

@msporny msporny closed this as completed Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants