-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple significant security vulnerabilities in the design of data integrity #272
Comments
I believe that the core of the issue highlighted above is in a lack of validation on the information that is to be verified. Any protected information or data must be validated and understood prior to consumption, no matter the protection mechanism. However, when a protection mechanism allows multiple expressions of the same information (a powerful tool), it may be important to better highlight this need. This is especially true in the three party model, where there is no simple two-party agreement and known context between issuers and verifiers, i.e., the scale or scope of the VC ecosystem is much larger when parties totally unknown to the issuer can consume their VCs. Certainly not understanding the context in which a message is expressed (or meant to be consumed) can lead to mistakes, even when that message is authentic. For example, a message that expresses "i authorize you to act on item 1", even if verified to be authentically from a particular source, can be misapplied in the wrong context (e.g., "item 1" was supposed to mean X, when it was misinterpreted as Y). In short, the context under which data is consumed must be well known and trusted by the consumer, no matter the protection mechanism. We might want to add some examples to the specification that show that the information in documents can be expressed in one context and transformed into another. This could include showing an incoming document that is expressed using one or more contexts that the consumer does not understand, which can then be transformed using the JSON-LD API to a context that is trusted and understood. This would also help highlight the power of protection mechanisms that enable this kind of transformation. For example, a VC that includes terms that are commonly consumed across many countries and some that are region specific. By using the JSON-LD API, a consumer that only understands the global-only terms can apply such a context to ensure that the terms they understood will appear as desired and other region-specific terms are expressed as full URLs, even when they do not understand or trust the regional context. All of this can happen without losing the ability to check the authenticity of the document. We can also highlight that simpler consumers continue to be free to outright reject documents that are not already presented in the context that they trust and understand, no matter their authenticity. |
The fundamental point of digital signatures is to reduce the information that needs to be trusted prior to verification. Most modern technologies e.g SD-JWT, mDocs, JWT and COSE and JOSE at large do this successfully meaning a relying party only needs to trust a public key prior to attempting to verify the signature of an otherwise untrusted payload. If the signature check fails, the payload can be safely discarded without undue expense. The problem with data integrity is that this assumption is not the same. In essence the relying party doesn't just need the public key of the issuer/signer, but also all possible JSON-LD context entries that issuer may or may not use, if any of these are corrupted, manipulated or untrusted ones injected, the attacks highlighted in this issue become possible. Whether it is even possible to share these contexts appropriately at scale is another question, but these attacks demonstrate at a minimum that an entirely unique class of vulnerabilities exist because of this design choice.
The point im making is not about whether one should understand the context of a message it has received or not, its about when it should attempt to establish this context. Doing this prior to validating the signature is dangerous and leads to these vulnerabilities. For instance a JSON-LD document can be signed with a plain old JWS signature (like in JOSE COSE), once the signature is validated one can then process it as JSON-LD to understand the full context, if they so wish. The benefit of this approach is that if the JSON-LD context have been manipulated (e.g the context of the message), the relying party will have safely discarded the message before even reaching this point because the signature check will have failed. Data integrity on the other hand requires this context validation to happen as a part of signature verification thus leading to these issues. |
Another take on this is that Data Integrity signing methods that sign the canonicalized RDF derived from JSON-LD, rather than the JSON-LD itself, enable multiple different JSON-LD inputs to canonicalize to the same RDF. The JSON-LD itself isn't secured - only RDF values derived from it. If only the derived RDF values were used by code, it might not be a problem, but in practice, code uses the unsecured JSON-LD values - hence the vulnerabilities. |
In the example where the If they were trying to depend on a credential of a certain type that expressed a holder's first name and middle name, it would not be a good idea to miss a check like this. Don't accept properties that aren't well- Approaches that work:
Communities developing and using new credential type specifications benefit from defining a good |
schema.org and google knowledge graph both use https://developers.google.com/knowledge-graph The problem is not JSON-LD keywords in contexts, the problem is insecure processing of attacker controlled data. If you want to secure RDF, or JSON-LD, it is better to sign bytes and use media types. You can sign and verify application/n-quads and application/ld+json, in ways that are faster and safer. W3C is responsible for making the web safer, more accessible and more sustainable. Data integrity proofs are less safe, harder to understand, and require more CPU cycles and memory to produce and consume. They also create a culture problem for RDF and JSON-LD by attaching a valuable property which many people care deeply about (semantic precision and shared global vocabularies), with a security approach that is known to be problematic, and difficult to execute safely. These flaws cannot be corrected, and they don't need to be, because better alternatives already exist. W3C, please consider not publishing this document as a technical recommendation. |
2024-05-08 MATTR Responsible Disclosure AnalysisOn May 8th 2024, MATTR provided a responsible security disclosure to the Editor's of the W3C Data Integrity specifications. A private discussion ensued, with this analysis of the disclosure provided shortly after the disclosure and a public release date agreed to (after everyone was done with the conferences they were attending through May and June). The original response, without modification, is being included below (so language that speaks to "VC Data Model" could be interpreted as "VC Data Integrity" as the original intent was to file this issue against the VC Data Model specification). The disclosure suggested two separate flaws in the Data Integrity specification:
The Editors of the W3C Data Integrity specification have performed an analysis of the responsible security disclosure and provide the following preliminary finding: Both attacks are fundamentally the same attack, and the attack only appears successful because the attack model provided by MATTR presumes that verifiers will allow fields to be read from documents that use unrecognized That said, given that a credential technology company such as MATTR has gone so far as to report this as a vulnerability, further explanatory text could be added to the VC Data Model specification that normatively state that all processors should limit processing to known and trusted context identifiers and values, such that developers do not make the same mistake of treating documents with differing The rest of this document contains a more detailed preliminary analysis of the responsible disclosure. We thank MATTR for the time and attention put into describing their concerns via a responsible security disclosure. The thorough explanation made analysis of the concerns a fairly straightforward process. If we have made a mistake in our analysis, we invite MATTR and others to identify the flaws in our analysis such that we may revise our findings. Detailed AnalysisA JSON-LD consumer cannot presume to understand the meaning of fields in a JSON-LD document that uses a context that the consumer does not understand. The cases presented suggest the consumer is determining the meaning of fields based on their natural language names, but this is not how JSON-LD works, rather each field is mapped to an unambiguous URL using the JSON-LD context. This context MUST be understood by the consumer; it cannot be ignored. A verifier of a Verifiable Credential MUST ensure that the context used matches an exact well-known
The former can be done by using JSON schema to require a specific JSON-LD shape and specific context values. This can be done prior to passing a document to a data integrity implementation. If contexts are provided by reference, a document loader can be used that resolves each one as "already dereferenced" by returning the content based on installed context values instead of retrieving them from the Web. Alternatively, well-known cryptographic hashes for each context can be used and compared against documents retrieved by the document loader over the Web. For this approach, all other JSON-LD documents MUST be rejected if they do not abide by these rules. See Type-Specific Credential Processing for more details on this: https://www.w3.org/TR/vc-data-model-2.0/#type-specific-credential-processing. This former approach is less powerful than using the JSON-LD Compaction API because it requires more domain-specific knowledge to profile down. However, it is still in support of decentralized extensibility through use of the JSON-LD Applying these rules to each case presented, for case 1: A verifier that does not use the JSON-LD API and does not recognize the context URL, A verifier that does not use the JSON-LD API and does recognize the context URL, A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context, and the values in the JSON will be restored to what they were at signing time, resulting in semantics that the issuer intended. For case 2: A verifier that does not use the JSON-LD API and does not recognize the attacker-provided context URL, A verifier that does not use the JSON-LD API and does recognize the attacker-provided context URL, A verifier that does use the JSON-LD API will compact the document to a well-known context, for example, the base VC v2 context (and optionally, Note: While the disclosure suggests that the JSON-LD Comparison to JSON SchemaThe scenarios described are identical in processing systems such as JSON Schema where document identifiers are used to express that two documents are different. A JSON document with differing Original document{"$schema": "https://example.com/original-meaning",
"firstName": "John"} New or modified document{"$schema": "https://example.com/new-meaning",
"firstName": "John"} Any document processor whether utilizing JSON Schema processing or not would rightly treat these two documents as distinct values and would seek to understand their values equivalence (or lack of it) prior to processing their contents. Even consuming a document that is recognized as authentic would be problematic if the Original meaning/schema{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/original-meaning",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"description": "The name by which a person is generally called: 'given name'",
"type": "string"
}
}
} New meaning/schema{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/new-meaning",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"description": "The name spoken first in Japan: typically a surname",
"type": "string"
}
}
} Demonstration of Proper ImplementationThe attack demonstration code provided adds the unknown modified/malicious contexts to the application code's trusted document loader. A valid application should not do this and removing these lines will cause the attack demonstrations to no longer pass: documentLoader.addStatic("https://my-example-context.com/", modifiedContext) To see "Proof failed" when this line is commented out and the failure result is logged, see: https://gist.github.com/dlongley/93c0ba17b25e500d72c1ad131fe7e869 documentLoader.addStatic("https://my-malicious-modified-context.com/", modifiedContext) To see "Proof failed" when this line is commented out and the failure result is logged, see: https://gist.github.com/dlongley/4fb032c422b77085ba550708b3615efe ConclusionWhile the mitigation for the misimplementation identified above is fairly straightforward, the more concerning thing, given that MATTR is knowledgeable in this area, is that they put together software that resulted in this sort of implementation failure. It demonstrates a gap between the text in the specification and the care that needs to be taken when building software to verify Verifiable Credentials. Additional text to the specification is needed, but may not result in preventing this sort of misimplementation in the future. As a result, the VCWG should probably add normative implementation text that will test for this form of mis-implementation via the test suite, such as injecting malicious contexts into certain VCs to ensure that verifiers detect and reject general malicious context usage.
|
If you consider the contexts part of source code, then this sort of attack requires source code access or misconfiguration. Validation of the attacker controlled content prior to running the data integrity suite, might provide mitigation, but at further implementation complexity cost. Which increases the probability of misconfiguration. A better solution is to verify the content before performing any JSON-LD (or other application specific) processing. After verifying, schema checks or additional business validation can be performed as needed with assurance that the information the issuer intended to secure has been authenticated. At a high level, this is what you want:
Most data integrity suites I have seen do this instead:
The proposed mitigations highlight, that these security issues are the result of a fundamental disagreement regarding authentication and integrity of data. Adding additional application processing prior to verification, gives the attacker even more attack surface to exploit, including regular expression attacks, denial of service, schema reference tampering, and schema version mismatching, etc... Any application processing that occurs prior to verification is a design flaw, doubling down on a design flaw is not an effective mitigation strategy. |
we are speaking about this pseudo-code
which is loop and simple string comparison. I don't see a reason for any of the exploits you have listed here except an implementer's incompetence. Please can you elaborate how those exploits could be performed and provide a calculation, an estimation, how much this adds to complexity? Thank you! |
@filip26, setting aside your apparent labelling of multiple community members who have participated in this community for several years as "incompetent". Your specific pseduo-code is in-sufficient for at least the following reasons:
|
@tplooker setting aside you are putting words in my mouth that I have not said which is quite rude and disrespectful ... add 1. you are wrong, by ensuring data is processed with a context you accept (the URLs) you know what is behind those URLs, and how much you trust those URLs, and perhaps you have a static copy of the contexts. If you follow untrusted URLs then it's an implemters fault. Use a browser analogy. |
I was browsing through past issues related to this. This specific issue was raised to suggest adding @tplooker given these new findings, would you revise your support since this was a bad recommendation introducing a security concern according to your disclosure? |
The URL for a context doesn't actually matter... In fact some document loaders will follow redirects when resolving contexts over a network (technically another misconfiguration). Depending on the claims you sign, you may only detect a mismatch in the signature, when you attempt to sign a document that actually uses the differing part of the context. Contexts are just like any other part of source code... Every single line of source code is a potential problem. You often don't control what 3rd parties will consider the bytes of a context to be... It's a feature, that's been turned into a defect by where it was placed. "It verified for me, must be a problem in your document loader." "I thought I would be able to fix it in only a few hours, but it took me 2 days and delayed our release" "I finally figured out how data integrity proofs work, thanks for letting me spend all week on them" I've paired with devs and shown them how to step through data integrity proofs, dumping intermediate hex values and comparing against a "known good implementation", only later to learn the implementation had a bug... Misconfiguration is common in complex systems. I'm arguing that security experts who have evaluated data integrity proofs against alternatives should never recommend them, because every problem they exist to solve is already solved for better by other technologies used in the correct order. Authentication of json -> json web signatures The essence of a recommendation, is that you believe there isn't a better alternative. |
@OR13 I'm sorry but don't see it. You mention two issues: misconfiguration and bugs. Well, we have tests, certification, etc. Those issues are endemic to any software applications but we don't call all the software vulnerable because of just a possibility that there might be a bug but after we find a bug.
I would really like to see the complexity estimated. I guess we are seeing a very different picture.
Please let's be factual, what experts, what was recommended, etc. In EU when press article starts with a title "American scientists have ... " everyone stops reading it (they add the American to make it credible ;) |
@PatStLouis you raise an excellent point regarding default vocabularies. It's never too late to change what's in a context (joke). This working group cannot prevent anyone else from adding a context that includes a vocab. You are reporting an architectural flaw, that was "solved for" by making it explicit in the base context, but it's not fixed by removing it from that context. If json compatibility isn't a requirement, the working group can drop the vc-jose-cose spec and remove the vocab from the default context... This might even improve adoption of data integrity while clarifying that RDF is the claims format that W3C secures. I've argued this point previously. |
@PatStLouis I agree this issue is relevant to the conversation, however the opinions I shared in that issue have not changed. |
Just to add some additional colour here @PatStLouis, I don't believe the recommendation of putting Furthermore, if others in the WG knew about this issue, specifically that |
@tplooker wrote:
@tplooker wrote:
Please stop insinuating that people are acting in bad faith. Now might be a good time to remind everyone in this thread thread that W3C operates under a Code of Ethics and Professional Conduct that outlines unacceptable behaviour. Everyone engaging in this thread is expected to heed that advice in order to have a productive discussion that can bring this issue to a close. |
From an implementer perspective maybe adding an example that "should fail" could be a good thing. Something like at #272 (comment) . As an implementation "case experience", I implemented in .NET something that produces a proof like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6 the university crendetial and then also verifies it. It felt a bit tedious to find out what to canonicalize, hash and and sign to get a similar result. The code is or less private code still, but now that https://github.com/dotnetrdf/dotnetrdf/releases/tag/v3.2.0 and the canonicalization is publicly released, I might make something more public too. I still feel I need to go through this thread with more thought so I completely understand the issue at hand. |
@veikkoeeva wrote:
Yes, that is already the plan for the test suite in order to make sure that no conformant implementations can get through without ensuring that they refuse to generate a proof for something that drops terms, and/or, depending on the outcome of this thread, use That's a fairly easy thing that this WG could do to ensure that this sort of implementation mistake isn't made by implementers. Again, we'll need to see how this thread resolves to see what actions we can take with spec language and test suites to further clarify the protections that we expect implementations to perform by default. |
I'm not suggesting a change, my goal is to understand why this recommendation was suggested in the first place and removing it is now listed as a remediation step to a security concern raised from the very same parties who suggested it.
Correct, Many protocols have features that can be unsecured depending how you use them. This doesn't make the protocol inherently flawed.
Apologies if you misunderstood my statement, but my intention was not to report an architectural flaw.
Yes The Data Integrity spec provides hashes for their entries that verifiers can leverage while caching the content. AFAIK this is already a thing.
Thank you for pointing out to these issues, I enjoy looking back at historical data from before my time in the space. As pointed earlier, some of the parties that made that recommendation are now recommending removing it as a remediation to a security concerned that they raised. The use cases listed for this recommendation was for development purposes as described in #953. Furthermore, the private claims section of the jwt RFC reads as follows:
Enabling this by default does not sound like a good recommendation to me. It's easy to setup a context file, it takes 5 minutes and a github account. If you are doing development, you can just include an Regardless this was already discussed by the group and the decision has been made. The OWASP defines a class of vulnerabilities as Security Misconfigurations. This is where I would see this landing in. While valid, it's ultimately the implementers responsibility to properly configure their system, and sufficient information is provided in order for them to do so. If I expose an unsecured SSH service to the internet, then claim that SSH is unsecured because I can gain unauthorized access to my server, that doesn't align since the security flaw is not in the protocol in itself be in my security configuration. Yes it's a vulnerability, no it shouldn't be addressed by the underlying protocol. For concluding I find this disclosure valuable as I got to learn a bit more about json-ld and gives a great resource to demonstrate implementers how to properly conduct verification of credentials + issuers how to properly design a VC. |
I would actually classify those attacks as "Data Integrity Signature Wrapping" (DISW) attacks. They share many similarities with XML Signature Wrapping Attacks (XSW) that occurred in the past. Also, note that it is possible to use XML Signatures securely if appropriate mitigations are implemented correctly. The same holds true for DI. The question is where we would add requirements for those additional mitigations for Data Integrity Proofs (DI). The VCDM uses |
It's not that simple if the goal is to retain the open-world data model and extensibility model that the W3C VCDM promises. There might be instances where a verifier does not recognize all values in the Example: VC using a base data model for all driving licenses
Example: VC issued by DMV of Foo
Example: VC issued by DMV of Bar
When crossing realms, verifiers in the realms of Foo and Bar may have agreed on using the base data model but not on the specific properties unique to Foo and Bar. Verifiers in the realm of Foo are primarily interested in the base properties of the Adopting the |
Great examples! Thanks! Some context on why I think why a test "should not happen" plus a less mentioned issue of having good examples. Related to #272 (comment): I'm not completely alien to this sort of work and indeed, when I implemented the "first pass sketch" of the code, I bit struggled with implications of this sort since I'm not so familiar with JSON-LD. So, I thought to "get back to with better time" and just not release anything before things are clearer (plus the library change not being public, though there's something already in the tests about this). Some part of that was if I have a document like at https://www.w3.org/community/reports/credentials/CG-FINAL-di-eddsa-2020-20220724/#example-6, how to pick apart the pieces for canonicalization, turning into bytes, hashing, signing and so on. For this "sketch" I was quite happy to have the same results with the keys as the example document, but I know I paid only passing thought for these sort of things. Partially because there's been related discussion earlier. I mention this about this example piece, since I think since good examples are perhaps more important than has been implied here. I naturally also think that a test of what not should happen are important -- and maybe add some notes of the sort to an example or two too. They're already something I've (we) been codifying to some tests. It's also a great way to document things. |
@awoie a verifier should not guess what's inside a context nor to try to anticipate if there is some agreement between context providers.
If a verfier recognizes both An ability to understand well know terms, e.g. defined by schema.org is a great feature but not in VCs eco-system where we don't want to guess but be sure.
It scales the same way as www does. None prevents you using other contexts, well known terms, etc and include all in your context. If there is a need, a good reason, to share parts between some parties, then the easiest, transparent, and scalable solution is this:
|
@filip26 wrote:
I didn't say it is not a solution. My point was that it is a solution which does not scale. A verifier from Foo might have never seen a @filip26 wrote:
No, it doesn't because the assumption of the |
It's up to an implementer how to allow to configure a verifier,. A static configuration has nothing to do with scalability. But I guess that you have meant that a verifier would not be able to accept a context which is not know - that's exactly what we want, and it does not mean that VCs do not scale, that there cannot be infinite number of different VC types, issuers, verifiers, etc. |
@filip26 wrote:
My point on scalability refers to an increase in operational costs, not necessarily performance. Performance might be another point but I cannot comment on that. @filip26 wrote:
If this is what we want, this sacrifices the open-world data model the VCDM promise as mentioned here. |
@awoie I'm sorry, I don't think we are on the same page and I'll let others to explain that it does not affect scalability of VCs eco-system nor open-world data model. |
@awoie My question is; if a verifier has no prior knowledge of foo or bar, why would they consider the extended data provided by those entities and how would this data lead to an exploit in their system? Verifiers will know what information they want to verify, they are not blindly verifying abstract data. As for the classification of this disclosure, while I can't really argue with your labeling, this is not a formal classification. If we take 2 examples of vulnerability disclosed around XML Signature Wrapping Attacks: Both of these affect a specific software and lead to 2 distinct CWE:
They are not addressed by a change to XML, but a security mitigation in the affected software. This is an important distinction to make and loops back to a Security Misconfiguration. It's hard for me to understand what exactly this disclosure tries to underline as the vulnerability
In seems the target of the vulnerability is being shifted around depending on the questions asked/comments made. |
@msporny this means that |
First and foremost, I wanted to thank both @tplooker for bringing this to the VCWG, and the Data Integrity editors for their analysis. Given the global interest in W3C VCDM, I am glad that this discussion is happening so that the right guidance can end up in the specifications going forward. Some input into both the discussion and the proposed changes to make the specifications stronger: From @peacekeeper:
+1 The "traditional narrative", as Markus notes, was grounded in a desire to have a "big tent". The ecosystem has moved on from when this narrative was articulated to the reality that post-VCDM 1.1, the data model is and remains JSON-LD compact form, which has been a global standard. So there is fully an expectation by anyone using VCDM 2.0, they need to understand that data model. What that in particular means is that, if you are NOT using a JSON-LD aware mechanism to process a VCDM 2.0 payload (Data Integrity being a JSON-LD aware option), you have an obligation to build in the "processing logic" to check for the things that are expected when using JSON-LD compact form (similar to how you need to be aware of when checking the particulars of a CSV, JSON or XML). I think this needs to be emphasized. There are other options for those who do not wish to leverage JSON-LD (and its power and flexibility) but if you are using VCDM 2.0, you can't pretend it is not JSON-LD. From @kimdhamilton:
+1 I personally think that this earlier choice was a mistake, that makes many other mistakes possible. At the same time, I fully see the value of @vocab when it comes to development and refinement of attribute bundles. So, I would recommend that in addition to removing @vocab from the base context, @vocab is provided as an optional secondary context that developers can manually insert into the payload during development time and, as such, becomes explicitly visible when it is in use. From @msporny:
+1 Very much so. Particularly when it comes to defining all terms concretely for production use, and a MUST NOT (instead of a SHOULD NOT as it currently stands) for using @vocab in production use. From @tplooker:
This feels right, but I don't know enough about the down-stream impacts of this, so would like to learn more. |
This creates a direct link between an issuer and a context set. Locks a holder in a situation to ask for a new credential every time when a different context is needed for some reason, even in cases when it can be translated one-to-one. Please note, many verification use-cases require only a few claims, especially in a context of SD. There might be a risk of clustering holders based on a context set (requested to issue/presented to verify) - which would be hardwired - At this very early state of VC adoption we can expect many custom contexts being around. Making this change without a deep analysis could potentially end up with a similar discussion to this one a few months later ... some use cases: |
Thank you @filip26. If I understand correctly, this is about how best to securely distribute @context files. If so, I agree that a deeper analysis would be helpful to understand both the options on the table and the associated trade-offs an implementer needs to consider before making a particular choice. |
Checking context is something that needs to happen at the application level and, if it is not checked properly, adding content-integrity checks will not help solve that problem, but it will harm use cases and decentralization. In sticking with the "human name swapping" scenarios we've been using, take for example an application that will accept either a "Name VC" from an issuer from Japan or from an issuer from the US. In fact, these VCs are protected by some JWT-based mechanism that will ensure that the context cannot be changed without losing protection over the documents. Now, suppose that the issuer from Japan issues their VC using a "Japanese context" that expresses the first and last name term names in the exact reverse way from the "US context". The issuer from the US issues their VC using the "US context". The application sees this and is written using pseudo code like this to consume the VCs after verification checks are performed (that would weed out any unacceptable issuers and ensure no changes to the expression of the documents):
All is well here, for the time being, but it is actually only by chance that this is true in an open world setting. Because then, asynchronously, the issuer from Japan sees that a number of customers in Japan want to be able to use their "Name VC" at US-context-only consuming applications. So, seeing as they weren't using Data-Integrity-protected VCs, they decide they have to also start issuing duplicate "Name VCs" to every customer that wants one, using the "US context". But now our application has a problem. You see, the application will happily accept these new "US context"-based VCs signed by the issuer in Japan, but the wrong code block will run! Depending on the scenario, this could crash the application or actually swap the data and perhaps produce a worse problem, like the concern here in this thread. Remember, this is true even though JWT-based protections are used that force a particular context to be used by the holder. The problem is, fundamentally, that checking the context is an application-level protection that must be performed by the consumer of the information. No basic JWT-verifier is going to check your custom claims or acceptable context combinations, just like no basic data integrity middleware would either. This is a validation responsibility of the application. We can see that if the application had used this code instead:
Now, the application would have continued to function just fine after the issuer from Japan made their asynchronous and decentralized decision to enable some of their customers to use the "US context". But, we can take this a step further. If, instead, the issuer from Japan uses Data Integrity to protect their VCs, they don't even need to issue new VCs to allow their customers to use the "US context". Any party can change the context of the VC without losing the protection. And note that if the application continues to use the second block, which they need to use anyway to properly consume JSON-LD, everything will work properly, no matter whether the context was set the way it was by the issuer or by the holder (or by the verifier themselves). This enhances decentralization, scalability, and open world participation. |
@tplooker wrote:
The software you provided specifically allowed the problematic contexts to be used, explicitly bypassing the protections you are criticizing other software in the ecosystem for not supporting. I know we (@awoie, @tplooker, and @msporny) keep talking past each other on this, so I'll keep asserting this in different ways until one of us sees the other's point. :P The VC Playground software highlighted is playground software that specifically does not implement validation rules. That is, we specifically do not enforce semantics in the VC Playground because one if its features is to allow developers to add arbitrary VCs and move them through the entire issue/hold/verify process. We did consider adding a "validation" feature to some of the examples, but even if we did that, your complaint would remain. That is, if a developer came along and used their own VC to do a full issue/hold/verify process, there is no way we could know what the validation rules are for their VC... should we reject all 3rd party VCs used in the VC Playground (limiting its use greatly)? Or should we require developers to provide validation rules for each VC (creating a higher burden to add arbitrary VCs to the playground)? In the end, we decided to focus on enabling the issue/hold/verify cycle and to come back to validation later. IOW, validation is out of scope for the playground, but we might add it in in the future. The digital wallet software highlighted does not attempt to validate VCs because that is (arguably) not its primary purpose in the ecosystem; that's the verifiers job. We could build validation into the digital wallet, but we're hesitant to do so because of the broad range of VCs people can put into a wallet and the likelihood of us getting validation wrong for arbitrary VCs is high. What do we display if we don't know of a particular VC type? A warning? An error? Both seem wrong and the UX would make issuers be annoyed at the wallet software for marking their VC as "questionable" when it's not. Enforcing application-specific
Hmm, disagree, but I see that this particular point hasn't been responded to yet (or I missed it). Will try to specifically respond to this point when I get some cycles later this week. I the meantime, I suggest we open new issues for each of the 9 proposals above and focus on each proposal separately. I know that is asking A LOT of those participating, but I'm also concerned that trying to evaluate 9 proposals in a single thread is going to result in a conversational flow that's going to be hard for everyone to follow. Would anyone object to translating this issue into 9 different issue/proposals and focusing on each proposal in a separate issue? |
Some closed ecosystem wallets might have specific validation rules, others might not. Regardless, a verifier should always have validation rules (unless its a public utility tool made available for experimenting/discovering, such as the vc playground, uniresolver, etc) having validation in these environment would simply ruin their purpose. If I set up an agent that will simply verify the proof on a VC, I still need to have some controller to apply business logic. I don't want my barebones librairy come with rigid validations, this is the developer's job to implement. If I want to check VDLs, I will cache the VDL context and verify its integrity. @tplooker If this isn't a misconfiguration error, how come proper software configuration will prevent this from being exploited? The myth that one single unconfigured verifier software must be able to verify and process every imaginable VC issued is a fallacy. The COVID passport verifier will verify covid passports, the age verification software will verify age according to it's jurisdiction's regulations. And these verifications will not happen with some arbitrary unknown/unheard of context/vc as input. If it does, then you can claim a vulnerability in the software since it was poorly implemented. There has been many vulnerabilities in software, even some leveraging JWT, believe it or not. Here's a list of known attacks. This being said I enjoyed these demonstrations, and they should be documented in a lab somewhere, maybe even classified in the specification. They highlight risks associated with not properly reading/implementing the specification. kudos for the MATTR team for putting these together. My suggestion as action items:
|
Hello all, as an organization who will be supporting DI signatures in our product as we look to engage with a wide audience in the credential landscape, I would support the following recommendations (with some suggestions for consideration, given what I have grok'd from the above...)
Although I fully support the fully-qualified names approach for ensuring there is no ambiguity in a secured document, I am concerned about the development overhead and lack of flexibility if this is required in all scenarios - but I am happy to learn more about the cost/benefit. In general I focused on the above because they seem to properly address the described vulnerability when securing DI protected documents, and not focus on alternatives. Business and engineering teams are free to examine alternative methods for securing data and their cost/benefit analysis. But if a choice is made and a solution calls for DI -- how do we protect it as best we can? No solution is perfect, but clearly acknowledging the risks and providing clear guidance to mitigate these risks will help organizations make the right decisions for their needs. (If the mitigations are still insufficient for the use case, consider an alternate solution/technology). |
As explained in the example in my comment above, locking down the context does not solve the problem, but it does create new ones. The fundamental problem is that an application is not performing validation on
Your application must only run against the context(s) it has been coded against. So if there is some context that uses German terms (or Japanese terms, or Frank McRandom's terms) and your application code wasn't natively written against that context, then your application MUST NOT try to consume the document. When you see the property "foo" in a JSON-LD document, it should be understood as a localized name -- and its real name is the combination of "the context + foo". If you ignore "the context", that is not ok. That is the source of the problem here. Notably, this actually isn't different from reading so-called "plain JSON" either, it's just that JSON-LD documents are self-describing, so "the context" is announced via So, what are your options when your application, written entirely in let's say, English, gets in a document that uses a context with German terms? You can either:
Note that this is very similar to "content negotiation". Some servers will accept Using the JSON-LD API, anyone can translate from context A to context B. Using Data Integrity, this can be done without losing protection on the information in the document. |
@dlongley In your mental model, what It sounds to me, that in your mental model, the issuer/holder provided Perhaps I'm not following correctly, but in your mental model, who determines what It would also really help if we could always keep an eye on a holistic solution when evaluating the proposals made in this thread, i.e.,
Is there any combination that is not valid, e.g., DI verifier + JSON processor seems to be odd although this is probably what most people are doing, i.e., using the compact form. I guess JSON-LD processors can rely on the expanded terms (IRIs) but I haven't seen many implementations that do. It was probably not helpful to have a polyglot approach to VCDM with all the different combinations of JSON-LD/JSON across the data model and securing mechanism layer which is why we ended up here. Irrespective of the solution we land on, I'd hope to be as explicit as possible in the spec and explain how this relates to options 1-4 above, and probably also 5-6. |
If this were just a misconfiguration issue, then why is the vcplayground, the three connected wallet applications and ~12 VC API backends connected to the vcplayground all "misconfigured". Surely if this is an obvious misconfiguration issue with no tradeoff, like you suggest then these software packages should have no issue being configured correctly? Of course in reality its not because these aren't valid "applications" like has been previously argued by @dlongley, they are, its because adding in this configuration means they can't easily scale with new credential types without painful, careful reconfiguration. That is why the VC playground and all connected software today doesn't follow this advice and why it isn't a practical solution to this problem.
Understood assert away :P and I will continue to make my point which I don't believe is being understood, as I've said before, the evidence in this community speaks for it self, we have plenty of examples of software "misconfigured" to use your terminology and little evidence of software that actually even follows this recommendation and thats because this isn't a configuration issue. |
This approach as a mitigation which is to perform hard validation against every context in a presented credential (effectively whitelist every context) simply doesn't scale and below I outline a usecase which demonstrates exactly why. Note, this isn't a theoretical use case either, we have lived this through real deployments of LDP and DI. At MATTR several years ago we decided to extend the VC 1.0 data model to include our own early attempt at credential branding. This involved us defining our own company based So in short @dlongley @filip26 @msporny and others, we have lived experience with your proposed solution here and it just does not work. It assumes all context values in an issued credential are critical to process when there many cases (like above) where some |
I had the update the permutations in my previous post because I figured there is also DI + JCS + JSON but it contains JSON-LD, so there might be JSON-LD and JSON processors. So, here are the updated permutations a solution should cater for:
|
There is nothing like "less secure apps" (perhaps you have meant a profile or something like that?). Now it looks like an euphemism to say make it mandatory .
I'm sorry, I don't believe the "lived experience with the solution", (e.g. sketched here: Regarding unused |
If you are willing to die on this hill that the vcplayground is representative of production software deployed to verify sensitive information and should be configured the same so be it. I can't take an exploit demonstrated in a public demo environment as empirical evidence that every software deployed is vulnerable in the same way. |
I guess another way to put it, @tplooker, is: If we implemented strict checking of To be clear, Digital Bazaar's production deployments do strict checking of |
@tplooker wrote:
but then you say:
Those two statements seem logically contradictory, please help me understand them. In order to accomplish "including a hash of the context entries in the document", you have to have a hash of each context entry when you issue AND the verifier needs to be able to independently verify the hashes of each context entry when they verify. IOW, the issuer needs to understand the contents of each context used in the VC and the verifier needs to understand the contents of each context used in the VC (or, at least, be provided with a list of trusted hashes for each context they are verifying). You then go on to say that allow listing contexts in that way is not scalable. The specification insists that a verifier needs to check to make sure that they recognize every context in a VC before they take any significant action. What is the difference between the verifier knowing the hashes of every context and the verifier checking the URLs of every context (which is vetted "by contents" or "by hash")? What am I missing? |
@msporny why does the verifier need to know the hashes? Wouldn't it be possible to sign over the hashes and include the hashes in the |
All this would do is prove the document still expresses "something" in the same way it did when it was issued. But, as a verifier, you still don't know what that "something" is. You have to understand the context to actually consume the information. You don't have to understand that to confirm that the underlying information hasn't changed or to transform it from one expression to another (that you might understand). So, the verifier will have to know the contexts (they can know them by hash or by content, as these are equivalent), such that they have written their applications against them, if they are to consume any terms that are defined by those contexts. This is why it does not matter whether the context is different from what the issuer used -- it doesn't help. Adding signed hashes doesn't help. In fact, if you lock the context down to a context that the verifier does not understand, it hurts. If there's a context that a non-compacting verifier could use to consume the document, but the holder isn't free to compact to that context, then the verifier will not be able to accept the document. The holder would be forced to go back to the issuer and leak to them that they'd like to present to a verifier that only accepts documents in another context, asking for them to please issue a duplicate VC expressed in that other context. If you have some special auxiliary terms that you want to consume in your own application, that you think many verifiers might reject based on a context they don't recognize:
|
Ok, let's presume that's what we do... let's say we do something like this in the
When DI generates the proof, that content is signed over (both in RDFC and JCS). Alright, now the issuer has explicitly committed to cryptographic hashes for all context URLs and wallets and verifiers can check against those context hashes.
Yes, and for the verifier to compute those hashes, they need to fetch and digest each context URL listed above (which means they now have the entire content for each context)... or they need to have a list that they, or someone they trust, has previously vetted that contains the context URL to hash mappings. Having that information, however, is only part of what they need to safely process that document (and I'm going to avoid going into the use cases that we make impossible if we take that approach just for the sake of brevity for now --EDIT: Nevermind, turns out Dave and I were answering in parallel, see his post right above this one for some downsides of locking down the context hashes at the issuer). IF (for example) we continue to allow The point is that the contents of each context need to be known by the issuer (in order to hash them and generate the proof) and by the verifier (in order to verify that the contexts have not changed from when the issuer used them)... and if each party knows that information, then they have to know about each context and its contents (either by value or by cryptographic hash)... and if you know that information, you can verify the signature (and it'll either work if nothing the verifier is depending on has changed, or it'll fail if the contexts don't line up for the information that has been protected, which is what matters). Did that answer your question, @awoie? PS: As a related aside, I'm pretty sure we're using the words "known (context)", "understands (the context)", "trusts (the context)" in different ways that are leading to some of the miscommunication in this thread. I don't know what to do about it yet (other than keep talking), but just noting that we probably don't mean the same things when we use those words. |
They aren't contradictory, but happy to explain. Fundamentally including a hash of all the @context entries as a part of the signed payload accomplishes the following It provides assurance to the issuer that in order for a relying party to be able to successfully verify their signature, they MUST have the same exact context as the issuer who produced the credential. This universally ensures context manipulation cannot happen after issuance without detection. And I might add there are more ways to mess with the context outside of the vulnerabilities I described at the start of this issue, so this just solves all of that out right. Because these @context values are integrity protected it actually means that a relying party could download them in certain situations over a network safely if they don't already have them, because if they get corrupted or tampered with in anyway, they are going to then fail in the signature validation and this is the key to solving the scalability challenge. The use-case I described above gets somewhat more bearable if I as a verifier encounter a VC with a context I don't understand and isn't actually critical to me understanding, I can safely resolve it over a network, cache it and be confident it hasn't been messed with when I validate the signature. This isn't a perfect solution, but it is much better then the current state of play and likely the best we can do with data integrity without simply just signing the whole document with JWS instead, which of course would be much easier. The important difference between your proposal and mine @msporny et al, is your solution
|
Only if |
@tplooker wrote:
@dlongley explains in this comment why what you are requesting is a logical impossibility. To summarize:
That might, understandably, seem counter-intuitive to some, but it does make logical sense once you think about it. So, let's walk through an example: In year 1, I use just the VCDM v2 context, which I'm going to ship to production (the reason doesn't matter, I'm just going to do that). In that VC, I use In year 2, I decide that I want to define those more formally, so I create a new context that I'll append after the VCDM v2 context and in that context I define It throws an error because @vocab was protected in year 1, which catches ALL undefined properties in the VCDM v2 context. Again, I know it sounds like a "nice to have" when said out loud, but when you think through the logical implementation of it, it doesn't work. I hope it's clear at this point that proposal 2 is unworkable. If there was some other way you were expecting it to be implemented, please let us know; perhaps we don't see what you see. |
@tplooker wrote:
Just "simply signing the whole document with JWS" does not:
It is a red herring; it is not a solution to the concerns that you have raised. A verifier still has to ensure that a VC secured with any enveloping signature contains the semantics that they expect. They cannot just blindly accept any list of contexts and start executing business rules, even if they trust the issuer. |
As mentioned above, if I understand your ask properly, I think it is a logical impossibility. The purpose of Now, the way that It would not be possible to ever have the core VC v2 context be followed by any other meaningful contexts. Clearly this is not desirable and would prevent every other common use of VCs. If a consumer desires the definition of any other terms after a "catch all" |
@tplooker wrote:
You didn't address the point of contention. The point of contention was that you (and @awoie, I presume) assert two things in your solution:
But then both of you state that distributing contexts in this way doesn't scale. It sounds like you're saying that "even if we do 1 and 2, the solution won't work anyway, because there is no scalable way to distribute contexts". It may be that you and @awoie think that the /only/ way for the verifier instance to operate is by having a completely fixed and static list of contexts they accept (and that that doesn't scale). It might be that you think that @filip26's example, which was just the simplest example that could be provided to demonstrate how easy it is to protect against the attack you describe (which is a minimum bar that the specification suggests), is the "one and only way" we're proposing? If that's the misunderstanding, then I can understand why you and @awoie are saying what you're saying. If it isn't, then I'm still seeing a contradiction. Please clarify what you mean by "does not scale", because it's a misunderstanding we could focus on and clean up before continuing with analyzing solutions. |
In an attempt to address the assertions you made above, which are beside one of the points of contention above: @tplooker wrote:
I already covered this point above. Developers can ignore any guidance in the specification. We call that "doing a bad job" or, at worst, a non-conforming implementation. We can write algorithm language and tests that make it far less likely for a conforming implementation to misimplement in the way that you are concerned about. I think we will get consensus to "do something" here, we're just debating what that "something" needs to be. At present, there is contention over at least two approaches:
The playground does not pin to context hashes because many of the contexts used are changing regularly. Data Integrity (using RDFC) gets its security from the signed statements, which cryptographically hash only the values in the context that are used. Verifiers must check context values that are used in messages sent to them in production. Developer software and playgrounds are NOT to be confused with production software.
IF an issuer and a verifier follow the rules and guidance in the specification today, they are guaranteed (in an enforceable way) that the number of statements, the protected terms, and the information they expressed will not change when the verifier checks them. If the issuer is sloppy in production and uses
I cover this point in a previous comment. |
I have raised w3c/vc-data-model#1514 to evaluate what to do about /cc @dlongley @kimdhamilton @aniltj @ottonomy @PatStLouis @mavarley @peacekeeper |
The following issue outlines two significant security vulnerabilities in data integrity.
For convenience in reviewing the below content here is a google slides version outlining the same information.
At a high level summary both vulnerabilities exploit the "Transform Data" phase in data integrity in different ways, a process that is unique to cryptographic representation formats that involve processes such as canonicalisation/normalisation.
In effect both vulnerabilities allow a malicious party to swap the key and value of arbitrary attributes in a credential without the signature being invalidated. For example as the attached presentation shows with the worked examples, an attacker could swap their first and middle name and employment and over18 status without invalidating the issuers signature.
The first vulnerability is called the unprotected term redefinition vulnerability. In general this vulnerability exploits a design issue with JSON-LD where the term protection feature offered by the @Protected keyword doesn't cover terms that are defined using the @vocab and @base keywords. This means any terms defined using @vocab and @base are vulnerable to term redefinition.
The second vulnerability exploits the fact that a document signed with data integrity has critical portions of the document which are unsigned, namely the @context element of the JSON-LD document. The fact that the @context element is unsigned in data integrity combined with the fact that it plays a critical part in the proof generation and proof verification procedure, is a critical flaw leaving data integrity documents open to many forms of manipulation that are not detectable through validating the issuers signature.
Please see the attached presentation for resolutions to this issue we have explored.
In my opinion the only solution I see that will provide the most adequate protection against these forms of attacks is to fundamentally change the design of data integrity to integrity protect the @context element. I recognise this would be a significant change in design, however I do not see an alternative that would prevent variants of this attack continuing to appear over time.
I'm also happy to present this analysis to the WG if required.
The text was updated successfully, but these errors were encountered: