Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative Options for Default Policy. #248

Closed
otherdaniel opened this issue Dec 9, 2019 · 16 comments
Closed

Alternative Options for Default Policy. #248

otherdaniel opened this issue Dec 9, 2019 · 16 comments
Labels

Comments

@otherdaniel
Copy link
Member

The current definition of the 'default policy' (even after the changes in #236) has the problem that it's not clearly specced what happens when the default policy callback would modify the DOM.

For example, the default policy might delete the element whose attribute is being modified. This doesn't seem like a super sensible thing to do, but the spec certainly allows it. As this occurs during the DOM implementation, the DOM would be modified during another modification to the DOM. At the very least, this requires exact specification of the point in time in which this occurs. Moreover, similar issues apparently have occured with MutationObserver, which has apparently led to a painful string of interoperability & implementation problems in browsers, so that feedback was strongly negative on this approach in general.

On the other hand, the "default policy" is an essential mechanism to allow a reasonable migration. Without it, only "big bang" migrations including all dependencies and 3rd-party libraries would be possible, which would likely prevent a large number of developers from utilizing the feature at all.

This issue is to discuss alternative choices for the "default policy".

@otherdaniel
Copy link
Member Author

Default policy alternatives

Background: Trusted Types' Default Policies

Trusted Types default policy is a mechanism that lets the web authors define a fallback behavior if a DOM XSS sink is called with a string (instead of a matching typed object). This happens when existing applications start migrating to Trusted Types (as the code does not use the types at all), and when using 3rd party code - for example, jQuery upon loading will write to innerHTML a few times at least.

Example:

trustedTypes.createPolicy('default', { createHTML: s => DOMPurify.sanitize(s) })
foo.innerHTML = '<img src=x onerror=alert(1)>';
foo.innerHTML // '<img src="x">';

Further example of a typical Angular application default policy here.

The fallback behavior is defined as a set of JS functions that accept a string and return a string, and perform various (synchronous) tasks, the most complex of which is likely to be HTML sanitization (e.g. running DOMPurify to remove JS payloads). Some sort of custom fallback behavior is crucial for successfully adopting TT in existing applications, as all existing code nowadays uses strings with DOM XSS sinks. A lack of a default policy (or an equivalent mechanism) would force sites into a "big bang" migration, where enforcement could only be enabled after all dependencies and third-party libraries have been converted. Experience suggests this oftentimes prevents adoption of a feature entirely.

Problem Statement

The default policy - as presently specced and implemented in Chromium - runs a user-supplied JavaScript callback function. Since it runs user code during the operation, it might cause unexpected modifications to the DOM state. For example, a default policy might delete the DOM node that the ongoing operation was meant to operate on. The intended use of the policy is to apply a string-to-string mapping (or reject a value), but there is no way to safely restrict a user-supplied callback to being side-effect free.

This is difficult to spec and implement in a sane fashion, and might cause interoperability issues, as different browsers, or different versions of the same browser, might have subtly different behaviour in some edge cases, security issues, as this allows to run a user-script on the DOM in an unexpected (and likely untested) intermediate state, or performance issues. Similar issues have occured with Mutation Events, which has led to a substantial string of problems with their implementation(s).

This document explores the alternatives to the current shape of the default policy in the Trusted Types API.

Alternatives

Alternatives fall into three major categories:

  • Eliminate the callbacks (possibly in favor of another mechanism)
  • Tighten down the specification (so that callbacks are run at a clearly defined & sensible point in time, and any potential side effects have a consistent and predictable effect on the DOM)
  • Restrict execution (so that callbacks cannot have undesired side effects).

Eliminate callbacks: No default policy

No default policy mechanism exists in TT. Assigning a string value to DOM XSS sink throws a TypeError. The fallback behavior can only be defined in JS code, by hijacking setters (similar to how TT polyfill does it).

Advantages

  • Minimal shape of the API.
  • Fails closed.
  • Might provide a stronger signal/nudge to library/framework authors to make their code work without a default policy.

Disadvantages

  • Puts the burden on the authors. Most existing web applications will have to hijack the setters.
  • Not everything can be polyfiled (location setter, eval), blocking the adoption of Trusted Types until after using strings in those places is eliminated from the application.
  • Problematic polyfilling for separate realms (e.g. iframes, workers)
  • Potential performance impact ('Trusted' values would likely also go through the polyfill, even if the polyfill then decides to just pass them through.)

Eliminate callbacks: No default policy, but ship with a supplied polyfill

This is basically a variant of "No default policy", except that we ship a polyfill for the "default" policy from the start.

Advantages

  • Same as "No default policy"
  • This might be a good way to ship a "v0.9", while experimenting with the API in a "safe" way. (As it's only a polyfill, it can be evolved independently from the browser(s), and anyone can do their own. Eventually, we could provide a native implementation for the final API that closes the remaining holes of non-polyfill-ability.)

Disadvantages

  • Also, mostly the same as "No default policy", but it somewhat lightens the burden on authors.

Restrict execution: Running the policy out-of-band

For each sink (e.g. IDL attribute setter), record whether it was set with a 'trusted' value or not. If a default policy is not configured, throw immediately for the untrusted case. Otherwise, apply the modifications (as without Trusted Types), but record the sink value as untrusted. Later, when the associated script is about to execute (for TrustedScript sinks) or loaded (for TrustedScriptURL sinks), run all 'un-trusted' values through the default policy.

Advantages

  • Runs the script only when script is about to run anyhow.

Disadvantage:

  • Debugging: Weird error handling. Also, if the default policy modifies the script, there's a weird phenomenon that what executes is different from what the inspector shows, because now there's a processing step in the middle.
  • Unclear (to me) whether this would generalize to TrustedHTML setters, like innerHTML.

Restrict execution: Worklets

The default policy might be implemented in a Worklet (similar to AudioWorklets). This gives the default policy an isolated global to run in.

Example:

// index.js
await trustedTypes.defaultPolicyWorklet.addModule('default-policy.js');

// default-policy.js
class MyDefaultPolicy extends TrustedTypesDefaultPolicy {
  constructor(domFragment) {
    super(domFragment);
  }
  createHTML(input, sink) {
    try {
      sanitize(domFragment, input);
    } catch (e) {
      return null;
    }
  }
}

registerDefaultPolicy('default', MyDefaultPolicy);

Advantages

  • Incentivizes writing idempotent, fast default policies
  • Polyfillable (though a bit ugly :/)
  • The policy executes on a separate global object, unable to access the original DOM state.

Disadvantages

  • API verboseness
  • TT implementation requires worklets (so far only Chrome and partially Firefox).
    Loading the worklet is asynchronous. This can be fixed though with (async () => {await worklet.addModule(url)})().
  • The default policy needs to be delivered separately from the application (in a separate module). (This could arguably be considered a reviewability advantage, though.)
  • Potentially, some policies might want to share the objects from the main execution. Perhaps the API needs to allow for passing configuration objects.
  • Likely substantial implementation complexity.

Tighten down specification: Running the default policy prior to all internal operations

This requires distinguishing between two classes of DOM mutations: Attributes (and attribute nodes), and all others. For attributes, run the Trusted Types checks and the default policy before any modifications are applied, thus making it strictly equivalent to running the default policy and attribute setting in sequence.

Advantages

Disadvantages

  • Unclear how to spec this
  • Unclear if this addresses the problem in the implementations (the core issue might be that the state changes unexpectedly in the middle of some other, larger operation - e.g. adopting a node subtree).
  • It might address the performance / security problem only partially. Perhaps it would need to be combined with isolating the default policy execution environment e.g. in Worklets.

Tighten down specification: WebIDL type mapping

This is a variant of "Running the default policy [... early …]" above, but based on different spec mechanics: The idea is to specify all applicable sinks in WebIDL, and handling default policy execution in the mapping layer.

Example:

Currently (in Chromium) trusted types are defined as a union type:

  interface TrustedScript { [....] }
  typedef (DOMString or TrustedScript) ScriptString;

The setter definion then differs from the canonical IDL. In HTMLScriptElement, we have:

  [..., RaisesException=Setter] attribute ScriptString text;

An alternative would be to drop the union type and use an extended attribute:

  [..., TrustedType=Script] attribute DOMString text;

Ideally, these could be upstreamed, where browsers not supporting trusted types could easily ignore the attribute. This would bring our IDLs closer to upstream; makes it easier to spot where trusted types are used; and makes it easier to test compatibility between browsers, both with and without trusted type support (e.g. in WPT tests).

For the purpose of our "default policy" problem, the idea is that (in both spec and implementation) the WebIDL deals with trusted types, analogous to how it deals with any other non-String type: The value is either accepted as-is, or is converted (e.g. stringified), or is rejected. This is exactly what we need for trusted types: We either want to accept the value, or convert it to something acceptable (via default policy), or reject it.

Conveniently, these checks always occur in a context that is already running user script, and before any actual DOM mutations (or somesuch) take place. So the idea is to strictly serialize user script -> trusted types -> DOM. (And thus guarantee that DOM mutation within another DOM mutation can happen.)

For most intents and purposes, the downstream (DOM, HTML) would only "see" the converted values. If it's necessary to make decisions based on the trust-status, this could be spec'ed (and implemented) so that the type mapping for [TrustedType=...] setters/attributes doesn't produce a string, but a tuple with a string value and an effective type enum (with a value set [string | TrustedScript | TrustedHTML | TrustedScriptURL ]). Any trust-related checks then boil down to an enum comparison.

Advantages

  • Preserves good developer ergonomics.
  • IMHO easier to reason about, since now trusted types is essentially sandwiched between the user script and HTML & DOM, and (mostly?) stays out of those layers.

Disadvantages

  • This imports trusted types & associated complexity into WebIDL (& the bindings layer)
  • Substantial implementation effort.
  • It's not clear if this can be solved only at WebIDL level (e.g. what about content attributes)?
  • Unclear how to spec the report-only behavior - if done naively, WebIDL would need to callout to CSP.

Eliminate callback: Mutation observers

Sanitize the values via MutationObservers (i.e. in a microtask after the DOM processing happened).

Disadvantages

  • The shape of the API is unclear.
  • Mutation Observers are only covering DOM sinks.
  • Microtask might be too late to introduce security checks. For example, in the following code data:text/html,<script>(async ()=>{location.href='javascript:alert(1)';await Promise.resolve(1);location.href='javascript:void(0)'})()</script> the first location assignment happens before the promise resolution allows the second assignment to neuter it.

@mikewest
Copy link
Member

/cc @annevk, @domenic, @foolip, and @zcorpan for opinions about what would most cleanly fit into HTML.

@annevk
Copy link
Member

annevk commented Dec 10, 2019

As I understand it, there are different enforcement points and for some a callback (default policy) can be invoked can for some this creates issues, as outlined above. I think it would be useful to distinguish the various callers of the callback more clearly to see what problems need solving where.

E.g., we can probably run a callback before executing script. We can probably run a callback before running an event handler.

The hard part is running a callback while mutating the tree. But is mutating the tree itself XSS? Or is it executing a script or running an event handler introduced as part of the mutation? Or both? Are there other callback callers that are potentially problematic in terms of timing?

@koto
Copy link
Member

koto commented Dec 10, 2019

There's also a policy called upon a navigation to javascript:.

The reason why we're running the default policy now (as part of Get Trusted Type compliant string) when setting attributes (IDL or content), is to be able for the JS statement to either:

  • not throw if the default policy allows the string value (potentially modifying it)

or

  • throw early if the default policy exists and rejects the string value.

IOW, we're not trying to catch the XSS at the latest possible moment, but error out early (with the default policy giving a way to silently recover from some error conditins).

@annevk, can you clarify what tree mutations are problematic? I think we only have that for a script text nodes. I think it's OK for the default policy to delay the execution if e.g. a text node is appended to a script that later on gets executed. I was under the impression that running when the attributes change (say iframe.srcdoc and script.src) would also create a problem. As specced now, the default policy could set the same attribute (causing an infinite loop), or delete the element node.

@annevk
Copy link
Member

annevk commented Dec 10, 2019

I think I understand the ergonomics issue, but I also want to ensure that if we decide to ignore it, we don't break other important functionality.

@otherdaniel
Copy link
Member Author

otherdaniel commented Dec 10, 2019

The hard part is running a callback while mutating the tree.

Yes, and I don't think we should do that at all. We only did that inadvertently, and didn't realize the implications. Our implementation largely skirted around the issue, since it always runs the "default policy" callback before passing it on to the DOM code.

As I understand it, there are different enforcement points and for some a callback (default policy) can be invoked can for some this creates issues, as outlined above. I think it would be useful to distinguish the various callers of the callback more clearly to see what problems need solving where.

Currently (implemented):

  1. <script> and < svg:script> will - if their content was modified by inserting/manipulating text nodes - run the default policy just before the script executes. (If their content was modified by calling properties like .innerText, then that'll be treated like any any other property access; see below.)
  2. eval will run the default policy before the actual eval happens.
  3. all other properties & attributes will check the Trusted Type on assignment, and will run the 'default policy' if necessary, before running the actual property (or attribute) setter.
  4. Changing an attribute node will be treated like directly setting the string value on the attribute. (I.e., really the same case as 3)

None of those mutate the DOM while the DOM mutation is ongoing. (At least not intentionally; but I might be overlooking something.)

We've tried to spec that, but I'm not sure we've been 100% successful in that. (I'm having a hard time parsing all the subtleties out of the spec text, to be honest.)

==============

edit: There's a fifth enforcement point, that koto briefly mentioned above: In case of navigation to javascript:-URLs, the default policy is run at the time of navigation, just before executing the actual URL script.

@domenic
Copy link

domenic commented Dec 10, 2019

Restrict execution: Worklets

Oooh, this is a cool approach that very directly attacks the problem, and probably encourages good policies. I like it, if you think we could get away with it.

Loading the worklet is asynchronous. This can be fixed though with (async () => {await worklet.addModule(url)})().

I'm not sure this fixes anything? await doesn't make things sync; it just allows you to write sync-looking code. In particular since nothing happens after the await in that code snippet, it is equivalent to just worklet.addModule(url). You need to put any parts of your application that depend on the default policy lexically after the await.

Tighten down specification: WebIDL type mapping

This seems similar in spirit to [CEReactions] which has worked pretty well. I think this is probably your best bet, especially if you review the [CEReactions] spec and see how to adapt it to work very similarly.

In particular I think the [CEReactions] spec sidesteps or answers many of the "Disadvantages" you list.

I believe you won't need the complex stack of queue of queues type system that custom element reactions use, since you only want to run one default policy callback? Or, hm, you might need something like it, because the side effects that that callback causes could themselves trigger another default policy callback. See the paragraphs following this definition for more on that.

And yeah, you probably even need the backup element queue (or an analogue), since you want to run this default policy in cases like the user editing a <script contenteditable>, I think.

@koto
Copy link
Member

koto commented Dec 11, 2019

@annevk, re: "describe the various callers of the callback" - the IDL index shows all the affected APIs - all the ones with a TrustedTypes extended attribute would, to the letter of the current spec, call a default policy.

Apart from the script text/src property that we now have slot-based approach to guard, that's iframe.srcdoc, object/embed attributes (we could punt it for later), document.write, and a few other functions like insertAdjacentHTML. IIUC, we can put guards on the function arguments, as this is happening before the function executes.

@domenic, re: Worklets and async. I'm not sure if (async () => {await worklet.addModule(url)})() fixes anything indeed. We need something that would complete execution before other code we have no control over does innerHTML. We can require installing a default policy in a separate script element, but I'm not sure how to pause the execution of other scripts that are, say, just loaded from a CDN and do innerHTML synchronously, or at DOMContentLoaded. Unfortunately, I'm a little lost here if that is even possible.

@koto
Copy link
Member

koto commented Dec 12, 2019

I took a look at the Worklets route, and I believe it's a hard sell, but only for one reason - having to fetch a module with the default policy before the rest of the scripts run is problematic (also from a performance perspective). If we could create a separate realm synchronously, without having to fetch(), that would solve the issue - but that seems very odd from the author's perspecive.

Regarding the [CEReactions] route - we actually now have a similar setup via our extended attribute.

[CEReactions] in HTML apply at IDL attributes level - and in DOM there's an explicit callback to them when content attributes change. Chrome's implementation of TT follows that model, but the TT reactions are not put on a queue, but executed immediately.

The optimal behavior we are after is exactly what polyfill does, which is hijacking the setters, such that default policy happens before DOM gets to process the attribute mutation. Calling it when the browser algos themselves mutate the DOM (e.g. from within the parser) is a nice-to-have, but not crucial for XSS guarding. After whatwg/dom#805 there's a single place where the default policy would be called for DOM. That's probably very naive, but maybe asserting that the crucial objects still exist (or were not changed) after validation steps would let us sidestep the risk?

@koto
Copy link
Member

koto commented Dec 12, 2019

There's also prior work in throw-on-dynamic-markup-insertion-counter that prevents specific JS code (custom elements) from calling dangerous API when called in a specific context. @annevk, do you think this approach might help here? I'm not sure what's the practical experience with implementing that counter, and whether this worked and prevented bugs in implementations.

Is there an exhaustive list of "bad things" that a default policy should not do when run during content attribute mutation?

@otherdaniel
Copy link
Member Author

Thanks all for the feedback! What I'm reading out of the replies so far is that the two top candidates are "Worklets" and "WebIDL type mapping".

I think modelling the "type mapping" after [CEReactions] is probably a very good idea: That should simplify both spec + implementation work. I'm also fairly keen on leveraging WPT's WebIDL tests for this, and for bringing our IDLs closer to the official ones.

(Personally, as an implementor, I'm in favour of "type mapping", although I find "Worklets" more appealing in theory.)

@koto
Copy link
Member

koto commented Jan 2, 2020

+ @annevk, resurfacing this question:

Is there an exhaustive list of "bad things" that a default policy should not do when run during content attribute mutation?

I'm trying to see whether the approach taken when speccing custom element constructor behavior is feasible. For CE there are safeguards prohibiting constructors from document.write etc.

@annevk
Copy link
Member

annevk commented Jan 3, 2020

I don't think so, the problem is mutating the tree and the mutating the tree has no existing guards and I don't think we should add them. Or at least not as a side effect of adding this feature.

@otherdaniel
Copy link
Member Author

So.. I think we've figured this out. We'd mainly like to follow the "WebIDL" solution outlined above: WebIDL gets a TrustedType=... extended attribute. Attributes and method parameters will be declared with the (unmodified) string types and the extended attribute.

In the common case, the TT check (meaning: unwrapping the trusted type wrapper in case of a trusted type, or running the string through a default policy in case of a plain string (or string-ified argument)) happens when the arguments are converted, just before an element's actual property or attribute setter are called. Thus, the JavaScript callback happens when the arguments are converted, and before handing off the string result to the DOM.

This won't cover everything, though. The other cases are:

  1. <script> and <svg:script>
    These get a 'shadow slot'. The script elements' attributes are handled just as in the common case above, except they also update the shadow slot. When the script is executed, the script text is compared against the shadow slot, and a TT check is performed on the string in case of a mismatch.

  2. Attr Node
    Manipulation or insertion of attribute nodes is treated like setting a text string on the same element. This requires changes to that part of the DOM handling.
    (The alternative would be a "shadow slot" for every single affected attribute, which seems wasteful.)

  3. Setting attribute by name (setAttribute)
    This requires a lookup to determine whether the selected attribute is a TrustedType= attribute. If so, it does the same check (and at the same time) as the common case.

  4. eval.
    Eval is as super-magic as always, and the TT check happens when eval is executed.

In terms of point in time, the "default policy" is executed mostly when the arguments are converted. The exceptions are script elements (prepareScript) and eval (before execution).

In terms of implementation, the IDL compiler will end up doing the bulk of the work, and the TT check is mostly run in the WebIDL bindings. The exceptions are script elements (shadow slot), attribute node functions and setAttribute/setAttributeNS (which need to lookup whether a check needs to be made), and eval.

koto added a commit to koto/webidl that referenced this issue Feb 10, 2020
(DOM|USV)String.

This is to hook up the Trusted Types validation during the ES->IDL type
conversion to avoid funky issues with its default policy.

See w3c/trusted-types#248,
w3c/trusted-types#176
@annevk
Copy link
Member

annevk commented Feb 11, 2020

Please don't call it a shadow slot, that's rather confusing with shadow trees and their slots, especially for a DOM feature. Otherwise I think the solutions for those cases make sense.

@koto
Copy link
Member

koto commented Mar 2, 2020

Please don't call it a shadow slot, that's rather confusing with shadow trees and their slots, especially for a DOM feature. Otherwise I think the solutions for those cases make sense.

In the spec they are simply called [[ScriptText]] and [[ScriptURL]] internal slots. I think we can actually remove the [[ScriptURL]] and treat it like any other attribute (after all, iframe.srcdoc is equally problematic).

I'll followup on the DOM integration in whatwg/dom#789 (I think the 'Attr Node' case and 'setAttribute' changes still need speccing) and the specific mechanics of WebIDL in WebIDL PR, but it seems like we've resolved this issue. Please shout if not. Thanks all!

@koto koto closed this as completed Mar 2, 2020
koto added a commit to koto/webidl that referenced this issue Jan 26, 2024
(DOM|USV)String.

This is to hook up the Trusted Types validation during the ES->IDL type
conversion to avoid funky issues with its default policy.

See w3c/trusted-types#248,
w3c/trusted-types#176
lukewarlow pushed a commit to lukewarlow/webidl that referenced this issue Mar 11, 2024
(DOM|USV)String.

This is to hook up the Trusted Types validation during the ES->IDL type
conversion to avoid funky issues with its default policy.

See w3c/trusted-types#248,
w3c/trusted-types#176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants