Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics and naming of types #7

Closed
xtofian opened this issue Sep 24, 2017 · 8 comments
Closed

Semantics and naming of types #7

xtofian opened this issue Sep 24, 2017 · 8 comments
Labels

Comments

@xtofian
Copy link

xtofian commented Sep 24, 2017

Naming

"Trusted" doesn't quite seem the right term to use. In many cases, the value of those types will be entirely or partially derived from untrusted data, however the values will be known to be safe to use in the destination (sink) context due to appropriate validation/sanitization/escaping that has been applied to the original value. For instance, in

var u = TrustedURL.sanitize(untrustedInput)

the string value of u will equal the string value of untrustedInput (i.e. consist entirely of a value from an untrusted source), if untrustedInput passes validation (e.g., is a well-formed http URL).

Of course, in some cases a value can be established to be safe due to its trustworthy provenance (e.g. a string consisting of a javascript:... URL can be treated as a SafeUrl if it is known to come from a trusted source), but that's usually just one way of establishing the type contract.

Semantics and type contracts

The type contracts used in Closure have a number of unresolved issues, which stem from the inherent complexity of the web platform.

In the setting of a particular application domain, we have largely been able to ignore or gloss over such issues; however, for the purposes of a standard they should presumably be kept in mind.

For example:

  • Types are implicitly indexed by the origin their values will be interpreted in: A value being "trusted" or "safe" to use as, say HTML markup, is relative to the origin it will be evaluated in. In practice, this usually doesn't matter much since we typically deal with the code base of a single application that will be served in a single origin. However, some applications involve documents or frames that are served in separate origins and, for instance communicate via postMessage. At this point, the trust relationship between components becomes relevant.

  • The TrustedResourceUrl (TrustedScriptURL) contract is somewhat subtle: Nominally, this type refers to the set of URLs that refer to resources that are trusted to be evaluated / executed in the application's origin. However, it is in practice difficult to firmly establish this property, since it is difficult for a runtime-library or a build-time toolchain to reason about the trust relationship between the application in question and the resource served by a given URL.

    In Closure's TrustedResourceUrl, we essentially use the property that the URL was constructed under "reasonably rigorous" application control (e.g., constructed from a compile-time-constant prefix with optional properly escaped path fragments and query parameters; see TrustedResourceUrl.format) as a proxy for the property that the URL refers to a trustworthy resource. This reasonably prevents bugs due to code that permits unintended injection into path fragments that might cause the URL to point to an unintended resource (e.g. via a redirector). But the TrustedResourceUrl type's constructors do not truly ensure that its values are in fact URLs pointing to trustworthy resources.

    Whether or not this approach is reasonable depends on the application domain and organization maintaining the code; other implementations might be preferable in certain settings (e.g. a centrally maintained registry of URLs that are indeed known to point to trustworthy resources).

    Similarly, in some settings it may be reasonable to assume that all URLs within the application's origin are serving trustworthy resources (i.e. any URL in the same origin, including any path-absolute URL, can be considered a TrustedScriptURL). This is convenient, since this property can be checked by a simple runtime predicate. However, this assumption is unsound if there's an open redirector anywhere in the origin.

  • It is unclear what properties should be included in the SafeHtml/TrustedHTML contract: Clearly, this contract should imply that the value does not result in execution of untrustworthy script when evaluated as HTML (e.g. by assigning to the innerHTML property). It's less clear if the contract should also require the rendered HTML will remain visually contained (i.e. does not make use of absolute-positioning styles). This property is necessary to prevent social-engineering attacks if sanitized, but originally untrustworthy HTML is rendered embedded within a web UI (for instance, a HTML email message rendered in an email UI must not be allowed to overlay/alter the UI of the application itself). However, it is not necessary if such attacks are not relevant or mitigated through other means in a particular application.

Also worth considering is the granularity of the types: In Closure, safe-content types are relatively coarse grained. For instance, the same type (TrustedResourceUrl) is used to represent URLs that are interpreted as script (<script src>), style (<link rel=stylesheet href>), and embedded content (<iframe src>). This choice was made to reduce the number of types to a manageable vocabulary, and has generally worked out reasonably well in practice. However, it is not clear if this tradeoff between complexity of type vocabulary and expressiveness is appropriate in the general case.

In particular, using an untrusted resource as the src of an (non-sandboxed) <iframe> has obviously much less damage potential (largely limited to social engineering / phishing attacks) than using it as a script source.

Proposal: "value neutral" type contracts

Thus, it seems rather difficult if not impossible to come up with precise specifications of type contracts that prescribe what values are "safe" or "trusted" for a given sink context, such that these specifications generalize well and don't make unwarranted assumptions about the particular application domain or development frameworks.

With that in mind, it may make sense to essentially dodge all these issues and define types and their contracts not in terms of safety/trustedness, but rather simply to characterize the semantics of the specific sink that will interpret the value, and do so at a fairly granular level. (Granularity seems to already be the intent; the proposal states "Introduce a number of types that correspond to the XSS sinks we wish to protect.") IOW, "value neutral" type contracts that focus on how the value will be used, rather than whether or not it is safe.

I.e., types such as

  • HyperlinkURL -- URL that is interpreted as a hyperlink (<a href>, as well as possibly <link rel=author href>)
  • MediaURL -- URL that refers to media (<img src>, etc)
  • ScriptSrcURL -- (<img src>)
  • StyleSrcURL -- (<link rel=stylesheet href>)
  • HTMLImportSrcURL -- (<link rel=import href>)
  • SameOriginHTML -- (HTML sink that will interpret and render the markup in the current origin, i.e. .innerHTML assignment, except if the element is a <template> element)
  • ...

One downside of this "value neutral" type naming is that the security implications of creating values of these types are less prominent. However, instances of the types can only be created via the .unsafelyCreate factory function, which conveys the security aspect (and in larger project will typically be usage restricted to ensure security review).

@koto
Copy link
Member

koto commented Sep 24, 2017

Thanks for the thoughtful feedback, I really appreciate it. My thought on that below:

Re: Naming

This name was chosen deliberately. Values assigned to the proposed types are later on trusted by the application. This doesn't imply that they are either trustworthy or safe - that only means that the application author effectively trusts those values (that trust was implicit when strings were used instead; now we are only making it explicit). We do not aim to enforce a stronger contract i.e. trusted types may still cause an XSS vulnerability, even if URL/HTML is sanitised. One example cause for the vulnerability might be open redirects, another might be triggering of script gadgets in the included TrustedHTML.

This makes the trusted types different than e.g. Closure types - they are more loose in the sense that the contract enforced is a weaker one. We only aim to allow for creating applications, which security evaluation focuses on parts producing the values that often cause DOM XSSes, not on the sinks. Enforcing a stronger contract would be problematic - e.g. in the platform we do not (yet?) have a way of recognising a constant, the sanitisation support is also quite shaky. Similarly, like you noted, unless the applications are packaged, a document itself (using the trusted types) cannot assure what is hosted under a URL (even if no open redirect is present). We can only do some approximations where we feel they make sense. Precisely for the reasons outlined I believe the values are effectively trusted, but we don't know if they are safe, though both properties can be met if combined with e.g. a build system that validates additional properties at compile-time. Using trusted types alone does not make your application DOM XSS-safe, but it makes it easier to verify (in automated or manual fashion) that it is effectively the case.

Re: Type contracts

We share a lot of the same challenges of a Closure safe type systems, but some are unique to either of those. One important challenge of trusted types is that they must work even when there is no compiler, and in a context of a single document. This is what caused the "downgrade" of the contract we're effectively able to enforce. Keeping that in mind:

  • Re: types vs origins Yes, for now we're handwaving that issue. A trusted type is not blocked, as long as the already running code created it. There might be additional (second-level) vulnerabilities that cause the usage of an 'unsafe' value wrapped in a trusted type from outside the execution context, like postMessage. These should not be common, but it's worth thinking now whether trusted types themselves should be Transferable.

  • Re: TrustedResourceURL limitations. Agreed, we're aware of the limitations, hence the contract carried by the types is weaker. We can only go so far in adding a couple of convenient builders, or preventing open-redirects on TrustedScriptURLs (@slekies has an idea), but we cannot assure safety here.

  • Re: TrustedHTML contract. We start minimalistic - no script execution. no goal to thwart social engineering, CSS styles pollution. There are other means to achieve that, and we probably wouldn't be safe without a sound sanitisation engine. Achieving general safety of HTML (even without JS) is a can of worms I don't think we can tackle with trusted types just yet.

  • Re: Type granularity. Gut feeling tells me there should be just a couple of types to facilitate adoption. We're aware this exposes us to possible polyglot vectors (JS code that happens to be a Flash file etc.), this is the tradeoff we're making.

Re: value neutral contract

This actually seems similar to what we are actually doing - the types are value neutral in the sense that they don't imply safety. We are deviating sometimes (e.g. a few less types, sometimes a blunt sanitizing / neutralizing builder exposed), but this is more-or-less how I think about the trusted types contract, and what I think the name implies (author trusts the value to be a script URL not causing havoc in the application).

@koto koto added the spec label Sep 26, 2017
@xtofian
Copy link
Author

xtofian commented Dec 1, 2017

Well, the whole point of introducing these types is to have a means of distinguishing values that are potentially unsafe to assign to a given sink from those that are indeed safe.

The problem that's solved by these types is that it is essentially impractical to audit all uses of XSS sinks in a non-trivial application, and flows of data into each of these uses. By changing sinks to only accept values of specific types (and/or interposing run-time predicates that validate or sanitize values that aren't already of the sink-specific "safe" type), we reduce this intractable problem to the much simpler problem of auditing only code that produces values of such types.

The purpose of these types is to reduce the "trusted code base" (i.e. code that needs to be assumed and manually verified to be correct) with respect to the security property "application has no XSS bugs" from a large fraction of the application source code (all call-sites of DOM XSS sinks, and all code that flows data into those), to the much smaller set of code consisting of the call-sites and fan-in of unsafelyCreate functions.

I.e., to fully realize the benefits of this approach, developers and security reviewers absolutely have to make the assumptions that all instances of the various types have values that are in fact safe to use in their corresponding contexts. (Which is why it's so important that an application code base contains only relatively few call sites of unsafelyCreate, and that those call-sites are structured to be "readily reviewable"; see #31).

I still think the spec should not prescribe a specific type contract, since it's not clear what that contract should be, what security considerations it should address (just XSS? layout-based social engineering-attacks?), or that there even is a single contract that is appropriate for all conceivable web apps.

I think we agree on this; where we differ is that I do believe that in any given use of these types in a particular application, they do have to have a security contract; it's just that that contract might to some extent differ across applications and development organizations; hence the spec shouldn't try to define (let alone enforce one).

Another way to look at it: In any given application, there'll be an implicit effective contract for each of these types, which is defined as the union of the sets of values produced at the application's reachable call sites of the type's unsafelyCreate function, for all possible, reachable program states. It's the developer's (or security reviewer's) responsibility to ensure that each of these sets only contains values that are safe to assign to the sinks that accept the type.

I.e., there isn't a universal definition of "safe" attached to the types as per the spec, but there certainly is a notion of "safe" that developers/reviewers would attach to these types in the context of their specific code base. The definition of "safe" is whatever makes sense in the context of their app, and is embodied in the implementation of builder/producer APIs that create values of TrustedTypes through use of unsafelyCreate.

(I realize I'm contradicting myself wrt what I said in the original issue under "value neutral" contacts).

One implication that falls out of this is that any factory functions that are defined in the spec (e.g. TrustedURL.sanitize) must be designed such that the values they produce are within any conceivable notion of "safe" for their corresponding sinks, across applications. This is not necessarily trivial. For instance, should TrustedURL.sanitize accept tel: URLs? On one hand, they're certainly benign with respect to XSS. However, there are contexts where they are problematic (or at least used to be -- e.g., within webviews in iOS, see discussion at golang/go#20586 (comment)). And, it's not even clear that the standard should permit arbitrary http/https URLs. For instance, an application might want to ensure that it only renders hyperlinks that refer within the scope of this application itself, and that links to external sites go through a redirector that performs real-time phishing detection or some such.

This indicates that perhaps the spec shouldn't provide for factory methods for these types at all. This might be a quite reasonable direction: I'd expect these types in the currently spec'd form (with very basic factory functions) to not be all that useful on their own.

To be practically useful, these types will need factories/builders for many common scenarios, e.g.:

  • composability of SafeHtml/TrustedHTML (i.e. the property that for all s, t : SafeHtml, s + t is also in SafeHtml), and a corresponding function that concatenates TrustedHTML (see e.g. SafeHtml.create).
  • factory functions that create HTML tags with tricky semantics, accounting for browser-specific quirks (example); of course this won't be an issue in spec-compliant browsers that implement TrustedTypes, but will have to be accounted for in libraries that support other browsers via polyfill of those types.
  • special case functions for safe-URL types, e.g. to create a blob: URL for a Blob, but only if it's of a MIME type that is considered to not result in content-sniffing issues (e.g., SafeUrl.fromBlob).
  • factory functions for TrustedScriptURL that choose a particular tradeoff between strictness and expressiveness (e.g. TrustedResourceUrl.format).
  • etc.

Of course, all of these factory functions can be replaced with ad-hoc code that relies on unsafelyCreate. However, that's very undesirable since it'll result in a relatively large number of such calls throughout the application source code, which will significantly erode the reviewability benefits.

I.e.. I'd expect these types to be used as a core building block of framworks and libraries that endow them with richer semantics (and corresponding builders and factories), such as the Closure SafeHtml types and their builders, or the SafeHtml types in Angular.

(aside: the one factory function in the current draft spec that's reasonably uncontroversial is TrustedHTML.escape -- escaped text should be within the notion of "safe for HTML" in the context of any conceivable application; however it's also not particularly useful in practice -- applications should just assign to textContent instead of using innerHTML with TrustedHTML.escape).

@xtofian
Copy link
Author

xtofian commented Dec 1, 2017

Re: Naming. One concern with the use of "trusted" in type names is that among security engineers, "trusted" refers to a thing that we need to trust (i.e. depend on) in order for a security property of our system to hold. (as opposed to something trustworthy, which we simply can trust so to speak).

With the introduction of the types, we've exactly removed the need to trust each and every value that flows into a sink (and the code that's involved in accomplishing that); we now only need to trust the code that's involved in producing new instances of such types (i.e. call sites of unsafelyCreate and their fan-in).

@xtofian
Copy link
Author

xtofian commented Dec 1, 2017

Another naming-related comment: unsafelyCreate is somewhat inaccurate as well: In a correct and non-vulnerable program, all call sites of unsafelyCreate are in fact safe, in the sense that for any reachable program state, the string passed to T.unsafelyCreate must be a string that is safe to use in the sink context for corresponding to type T. If that were not the case, the program would have a vulnerability.

I.e. these conversion functions are potentially unsafe, but must only be used in scenarios where they are actually safe (as established through security review of each call-site).

@koto
Copy link
Member

koto commented Dec 1, 2017

I extracted the sanitizers discussion to #32, let's leave this bug for the naming concerns.

... to which I so far have an unsatisfactory answer.

We chose trusted, as that for us meant that not we need to trust the values (not to introduce XSS), we are trusting them. If the application using TT has an XSS vulnerability, it's because it trusted a value that was not trustworthy. This signifies that creating values of such types should be audited, as later on the application trusts the types to be safe. Briefly looking at the Trusted_systems_in_information_theory, it seems to match that definition too.

I still think TT name catches our intention correctly, but I don't have any strong opinion. If there's a compelling alternative name, I'm all for it. It looks like with introduction of sanitizers (#32) and the ability to lock down unsafe creations (#31) we might be able to make Trusted Types safe (wink wink nudge nudge).

@xtofian
Copy link
Author

xtofian commented Dec 6, 2017

I don't have a terribly strong opinion on naming either. And I agree with you -- in a code base where unchecked conversions (unsafelyCreate) are locked down, it really does not matter all that much: Application developers don't really need to understand the nuanced semantics of the types. All they need to know is that they have to provide a value of the right type for the sink in order for it to be accepted. With calls to unsafelyCreate being constrained, the only way of doing that is via a (safe) builder API. Only developers / security engineers who write/review code that uses unsafelyCreate do have to understand the purpose and semantics of these types, and the implications of using them incorrectly.

In other words, neither developer usability nor the resulting security posture should be significantly affected even if we called the types FrobHTML and so forth: If a developer writes foo.innerHTML = someString, they'd get an exception that .innerHTML wants a FrobHTML. They'd then go and read the documentation on how to make values of type FrobHTML. APIs to create FrobHTMLs (except for unsafelyCreate, but that's usage-restricted) are designed to be inherently safe, in the sense that no matter how they are called, they produce FrobHTML values that will not result in security vulnerabilities when assigned to .innerHTML.

FWIW, I'm not totally happy with the naming convention of SafeHtml either. In particular, there are scenarios where a value is a priori unsafe in the current context, but is used in a different scenario where it's actually safe. For instance,

let safeHtml = SafeHtml.unsafelyCreate(someAttackerControlledString);
let iframe = document.createElement('iframe');
iframe.sandbox = 'allow-scripts'
iframe.srcdoc = safeHtml;

This code is safe (wrt XSS) because the untrusted HTML is populated into a sandbox iframe. However, the safeHtml instance is absolutely not safe in the current context, and it's not all that appropriate that the type is called SafeHtml here (i.e., it'd be a security bug to assign it to .innerHTML even though SafeHtml is normally supposed to be safe for that).

This dissonance arises because the type contract is implicitly dependent on the origin the value is used in, but the type name doesn't capture that.

OTOH, the above example doesn't sound any better with TrustedHTML. This led me to the suggestion to not include Trusted or Safe or similar notions in the type names, and rather have them represent "values that are accepted by the corresponding sink" (e.g., HyperLinkURL is the type that's accepted and required by <a href="..."> and `).

At the same time, this discussion has shown that for these types to be useful, their contracts (as implicitly defined by the union of their builders in scope of the app) do have to have a notion of "safety" or "trustedness" -- the whole point is that assigning a value of a given type to a corresponding sink that accepts that type should never result in a security vulnerability.

Anyway, naming is hard...

@xtofian
Copy link
Author

xtofian commented Dec 6, 2017

Another way to look at it: With TrustedTypes enabled/enforced, the application's TCB wrt the security property "there are no XSS vulnerabilities" are the types, or more specifically, the code involved in constructing instances of the types (i.e. call sites of unsafelyCreate and surrounding code). IOW, we want the property that if the application were to have an XSS, the root cause has to be in that TCB.

In that sense, using the term "trusted" in type names is kind of appropriate. I'm still a little bothered however by the fact that in the meaning of "trusted" as in "TCB", it's not the types or their values that are being trusted, but rather the code surrounding call sites of unsafelyCreate.

(of course, the browser itself, its implementation of the SOP and so on, is part of the XSS-relevant TCB too; but in this context we're interested in bugs in application code, not the platform itself)

@koto
Copy link
Member

koto commented Jan 17, 2019

Let's just go with Trusted, it seems the name stuck. Future visitors, please reopen if you feel strongly against it.

@koto koto closed this as completed Jan 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants