Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does this help existing code? #1

Open
bakkot opened this issue Jun 5, 2019 · 14 comments

Comments

4 participants
@bakkot
Copy link

commented Jun 5, 2019

I raised this at TC39 today; copying it out here.

I don't think we should be adding features to aid writing eval in any new code. I understand that isn't the goal: that the broader goal of this proposal is to allow more fine-grained control over eval so that people stuck using legacy codebases aren't forced to just use unsafe-eval. But since legacy codebases are, by definition, not using trusted types or whatever other new feature allowed producing evalable things, how does this help meet that goal?

@koto

This comment has been minimized.

Copy link

commented Jun 5, 2019

The problem we're trying to solve is exactly with the legacy codebases that won't realistically change (e.g. they are outside of direct author control, are 3rd party scripts, are too costly to refactor etc. - see example of such dependency). Such code can use Trusted Types implicitly without code changes via default policy.

Evalable is targetted to legacy code like that that forms part of the application whose authors wish to harden, such that no new eval calls happen. With current settings and CSP, it's all-or-none. If at least one dependency uses eval (and a lot do, which authors who try to adopt CSP can talk about at length), everything can. Evalable is a pragmatic choice that allows us to lock this down more precisely.

The intention is not to encourage eval, it's to give some lockdown possibility even if you have to use it.

@bakkot

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

Such code can use Trusted Types implicitly without code changes via default policy.

I don't understand how this proposal helps with that case: if you have legacy code using the default policy, it is presumably passing a string to eval, not a new kind of evalable. So why does there need to be a new kind of evalable at all?

@koto

This comment has been minimized.

Copy link

commented Jun 5, 2019

Oh, I see. I believe this is more clear when viewed with the second proposal from @mikesamuel. The type check (via evalable symbol here) happens in PerformEval, which would be after the default policy runs (and may return an evalable value, if the policy decides the code should be allowed to run) at HostBeforeCompileValue.

@bakkot

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

Why would HostBeforeCompileValue not just return a string in that case? The value it returns is never exposed to user code, is it?

@mikesamuel mikesamuel transferred this issue from mikesamuel/evalable Jun 6, 2019

@mikesamuel

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2019

Fyi, per @annevk's request, I combined the host callout and evalable proposals, and transferred this issue to the merged proposal.

@mikesamuel

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2019

@bakkot,

The value it returns is never exposed to user code, is it?

Correct. The args list and its return value are both internal lists created by other spec abstractions or host callouts, not JS arrays.


I don't think we should be adding features to aid writing eval in any new code.

I tend to use the term "eval" loosely to mean code from string loaders including eval(x), Function(x), import('data:text/javascript,' + x), and even setTimeout(String(x), ms).

Are you objecting specifically to eval (indirect &| direct) or to all of the above?


I don't know of any feature that is straightforward to adopt and that might reduce the attack surface for pre-existing uses of eval that would not also aid writing eval in new code?

If not, it seems that this standard would prevent all features that might reduce the attack surface for pre-existing uses of eval.

So I have to question whether this is an appropriate standard to apply.


That said, I also would rather developers not use any of those flavours of "eval" in new code.

But if I had to choose between

  • aiding developers in using "eval" in new code in a way that enables blue team oversight
  • providing an alternative to turning off code loading guards when a carefully reviewed use of eval seems the most straightforward path to a new feature
    I prefer the latter.

It took quite a while to come to this preference, because dynamic code loading seems like using dynamite to catch fish.

My change in view was based on several realizations:

  1. eval currently prevents strict CSP adoption for many large apps
  2. replacing a use of eval(myString) with eval(myTrustedScript) does not expand attack surface
  3. if an app replaces every use of eval(myString) with eval(myTrustedScript) and whitelists every policy it still reduces attack surface by shutting off implicit Function: ({})['constructor']['constructor'](code).
  4. applications do not migrate in one fell swoop. It took > 1 year to migrate Gmail to TT for server-side code. We started by just approving a lot of dodgy, unreviewed stuff, and then repeatedly polished the stone until we got to a better place.

But since legacy codebases are, by definition, not using trusted types or whatever other new feature allowed producing evalable things, how does this help meet that goal?

I think it's important to distinguish between 2 different kinds of legacy:

  • Legacy applications where the migrators have direct control over the source code.
  • Legacy third-party dependencies which are harder to adjust

Both new and legacy applications developed alongside TT may have legacy dependencies.

Legacy applications may use trusted types. Just not for all trust decisions.
A legacy application is just one that was not developed alongside TT and so which is in the process of migrating.

There are a few use cases where I expect legacy dependencies would persist in libraries even in

Here are some common use cases:

Constant patterns

<meta http-equiv="Content-Security-Policy" content="trusted-types default">
<script>
// Application code defines a default policy
Trusted.createPolicy(
  'default',
  {
    createScript(x) {
      if (/^return this;?/.test(x)) {
        // Make sure there's a semicolon at the end so that
        // the result can be safely concatenated with other TrustedScripts.
        return 'return this;';
      }
      throw new Error();
    },
  });
</script>
<script>
// Legacy library code uses `Function` as a strict mode escape to access the global scope
// appears in lodash, and other libraries that have large installed bases, code size constraints,
// and need to support old interpreters.
(new Function('return this')())
</script>

Formulaic metaprogramming

I'm mostly familiar with this in Node.js code.
Widely used libraries like depd use new Function to create function wrappers that have the same .length so that they work with reflective code like promisify.

Note that fairly recent versions of depd used eval instead of new Function.

Since this is formulaic, it's easy to recognize and handle using a default policy.

Code that hasn't been migrated yet

One common use case for large, legacy applications is ad-hoc reporting: letting the user specify arithmetic equations which are used to generate a column in an HTML table that shows data.

In "What about eval":

Mathjs is often called with untrusted inputs. If a developer wanted to let a user generate an ad-hoc report without having to download data into a spreadsheet, they might use Mathjs to parse user-supplied arithmetic expressions (docs) instead of trying to check that an input is safe to eval via RegExps. It is not without risk (advisory) though1.

  1. Since this writing, Mathjs got rid of all uses of eval

Starting the migration process and being able to turn on some protections even if the attack surface is still large is a necessary first step.

There's a problem akin to penguins clustering at the water's edge reluctant to enter.

There's little incentive for libraries like depd or Mathjs to increase code size to not use eval until they have a significant number of clients that would be able to reduce their attack surface by not using eval. Smaller libs like Mathjs (josdejong was unusually proactive) have no incentive as long as they can say "depd blocks almost all my clients anyway." Large libs like lodash & depd likewise have little incentive as long as they believe there's a long tail of small libs preventing.

I see eval TT guards as a way to break that deadlock.


Anyway, apologies for the length, and thanks for agreeing to take this offline.

@mikesamuel mikesamuel self-assigned this Jun 6, 2019

@bakkot

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

Thanks for the reply!

Are you objecting specifically to eval (indirect &| direct) or to all of the above?

All of the above.

Legacy applications where the migrators have direct control over the source code.

I would strongly prefer that these migrators be forced to get rid of eval (and equivalents) entirely in the code they control. I realize this means some people will leave unsafe-eval on instead of getting rid of it, but I think it also means some people will get rid of eval instead of using trusted types to prolong their usage of it. I think taking a hard line on this is the only way we'll end up in a world where people no longer ship code with eval.

Legacy third-party dependencies which are harder to adjust

I agree that it's a huge improvement to have a default policy which can restrict which strings are allowed to be passed through to eval, but it seems to me that such a policy is enforceable purely in terms of the proposed HostBeforeCompileValue, without needing evalable (or any other changes to the language). I'm still failing to understand how the addition of a language-level evalable helps these cases at all.

Can you give a concrete example of how the addition of evalable would be necessary for making legacy third-party dependencies work?

@mikesamuel

This comment has been minimized.

Copy link
Collaborator

commented Jun 10, 2019

I would strongly prefer that these migrators be forced to get rid of eval (and equivalents) entirely in the code they control. I realize this means some people will leave unsafe-eval on instead of getting rid of it, but I think it also means some people will get rid of eval instead of using trusted types to prolong their usage of it.

I would also prefer this. There's no reason it could not have happened over the last 3 years, so it seems to me there would have been pressure on widely used libraries like lodash, jquery, and others to get rid of eval (and equivalents), but I have not seen that pressure.

We would have seen pressure on webpack and other widely used tools to not produce output that uses eval, but if there is such pressure, I've not seen it.

@arturjanc Do you know of any stats on CSP policies migrating from 'unsafe-eval' to no 'unsafe-eval' on the web at large that might shed light on whether this is a viable goal?

it seems to me that such a policy is enforceable purely in terms of the proposed HostBeforeCompileValue, without needing evalable ... Can you give a concrete example ... ?

This is an excellent question.

The HostBeforeCompileValue changes are sufficient for new Function(...) and eval(myString) which includes the depd 1.x.x case.

That leaves uses of eval that are not translatable to Function and which take a TrustedScript value instead of relying on an implicit string->TrustedScript conversion via a default policy.


I can't point you at code, but it actually took us an inordinately long time to migrate a large Google application to turn off unsafe-eval because it predated JSON, and so used eval to unpack data received from the server.

It used* eval to unpack updates from the server that were in an almost-JSON format and then dispatched those to lots of other pieces of code that made many assumptions about the structure of those decoded values.

  1. Had hard performance requirements on older browsers, so custom parsing was out.
  2. Used array holes like [0,,2] extensively to minimize wire size.
  3. Had many clients that expected undefined instead of null because of these holes.
  4. Included NaN and Infinity.
    and that was the quirks left over after we took care of quotes, \x escaping issues, and IIRC the use of f and t instead of false and true.

These inputs were large, and because of the performance requirements, trying to validate this using the default policy would not work, so we did something like (but not exactly because our client-side trusted-types relied on static analysis instead of policy objects)

const unpackResponse = (() => {
  const uncheckedConversionPolicy = new TrustedTypesPolicy(
      '...',
      // Approved for use in a limited scope.  See below.
      { createScript(s) { return s; } });

  return ({ status, responseText }) => {    
    if (status === 200 && ...) {
      // backend.example.com is a known producer of TrustedScript responses
      // since it uses TrustedTypes for server side discipline.
      return (0, eval)(uncheckedConversionPolicy.createScript(responseText));
    }
    throw new Error(...);
  };
})();

You can imagine that this was benchmarked nine ways from Sunday so the actual code probably differs from this in important ways, but these are the kind of legacy-ish requirements that we hope to make it possible to disentangle.

Marking a value trusted manually has overhead that is independent of the size of the result, and can be made based on indicators apparent to a scoped policy like:

  • provenance (a client trusts responses from a server that is known to use trusted types discipline internally), or
  • safety by construction (e.g. form inputs are used to assemble arithmetic equations in JS that populate a tabular view)

which are not apparent to the unscoped default policy.

* there's some chance it still does

@arturjanc

This comment has been minimized.

Copy link

commented Jun 12, 2019

@arturjanc Do you know of any stats on CSP policies migrating from 'unsafe-eval' to no 'unsafe-eval' on the web at large that might shed light on whether this is a viable goal?

I don't have current data, other than the observation that ~82% of sites with CSP used 'unsafe-eval' a couple of years ago (Table 3, page 8 of the linked PDF). I doubt this has significantly changed since then.

Based on a more anecdotal data set, it seems like many popular sites are still actively relying on eval(), not just allowing it via CSP -- it's difficult to find major applications that have eliminated it completely. 

My intuition matches @mikesamuel's here: it seems difficult to require developers to completely remove eval() from their apps. One horror story from a few years ago are Chrome's apps/extensions enforcing a default CSP which disallowed eval(), which resulted in libraries like AngularJS writing custom eval()-like intepreters in JS, completely undermining enforcement at the platform level and leading to new classes of vulnerabilities. It might be nice to avoid giving developers an incentive to do this again :)

@mikesamuel

This comment has been minimized.

Copy link
Collaborator

commented Jun 12, 2019

which resulted in libraries like AngularJS writing custom eval()-like intepreters in JS, completely undermining enforcement at the platform level

Thanks. Better not to have to amend GreenSpun's 10th to include JS:

Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

@bakkot

This comment has been minimized.

Copy link
Author

commented Jun 13, 2019

Thanks for the example. My understanding then is that evalable is targeted specifically at cases where all of the following hold:

  1. the use of eval is under the author's control (i.e. not in a dependency)
  2. the code being passed to eval is not sufficiently regular that it can be safely checked by a default policy
  3. it is not practical to refactor this code to avoid eval
  4. it is practical to refactor this code to create instances of a trusted type and pass them through to eval in a way which is obviously safe.

It is not intended to help with any other cases. Is that right?

If that's so, I would like to see data indicating that this case in particular is common enough to warrant a change to JS before we went ahead with it. I know eval is very common, and adding unsafe-eval is consequently also common, but I would expect almost all uses to fail one of the above four tests (especially the first) and hence not be relevant to the usefulness of evalable.


Sidebar: with the caveat that I have not fully read the trusted types proposal, why is it not possible to implement an equivalent of evalable using just the default policy mechanism, along the lines of

<meta http-equiv="Content-Security-Policy" content="trusted-types default">
<script>
// Application code defines a default policy
{
  let trusted = new Set;
  trusted.has = trusted.has.bind(trusted);
  trusted.add = trusted.add.bind(trusted);
  trusted.delete = trusted.delete.bind(trusted);
  window.trusted = trusted;
  Trusted.createPolicy(
    'default',
    {
      createScript(x) {
        if (!trusted.has(x)) {
          throw new Error();
        }
        trusted.delete(x);
        return x;
      },
    });
}
</script>
<script>
{
  let trusted = window.trusted;
  delete window.trusted; // there are other ways of doing this, like import maps; it's just an example.
  const unpackResponse = (() => {
    return ({ status, responseText }) => {    
      if (status === 200 && ...) {
        // backend.example.com is a known producer of TrustedScript responses
        // since it uses TrustedTypes for server side discipline.
        trusted.add(responseText);
        return (0, eval)(responseText);
      }
      throw new Error(...);
    };
  })();
}
</script>
@koto

This comment has been minimized.

Copy link

commented Jun 13, 2019

Thanks for the example. My understanding then is that evalable is targeted specifically at cases where all of the following hold:

  1. the use of eval is under the author's control (i.e. not in a dependency)
  2. the code being passed to eval is not sufficiently regular that it can be safely checked by a default policy
  3. it is not practical to refactor this code to avoid eval
  4. it is practical to refactor this code to create instances of a trusted type and pass them through to eval in a way which is obviously safe.

It is not intended to help with any other cases. Is that right?

I don't think this is the case w.r.t. point 1. evalable is needed to make TrustedScript objects able to be passed to eval-like sinks with no backwards-compatiblity issues. These objects can be created by 1st party code, or 3rd party code, but in any case the 1st party (or whoever controls the response header) has control over their creation via policy name whitelists. We expect the majority of the code producing that to be 1st party, but there is an irreducible amount of a code that relies on eval out there in popular dependencies. That code might migrate to using TrustedScripts as well, and I believe this is an improvement over the current situation, which is impossible with current solutions.

Why is evalable not useful for 3rd party code? The default policy is an escape hatch that we can use if the code continues to use eval(string), but if that code (be it 3rd party) decides to create TrustedScript objects, why would that be discouraged more than for 1st party? After all, the 1st party gets more control over the code in that regard, not less (even taking into account existing capabilities of CSP), allowing it to lock down eval even more than what is practically possible today.

If that's so, I would like to see data indicating that this case in particular is common enough to warrant a change to JS before we went ahead with it. I know eval is very common, and adding unsafe-eval is consequently also common, but I would expect almost all uses to fail one of the above four tests (especially the first) and hence not be relevant to the usefulness of evalable.

One indication I can provide is we did a review of popular JS libraries a while ago - https://ai.google/research/pubs/pub46450. 10 out of 16 used eval() in a way that's essential to them, mostly for implementing expression language support for the templating system.

Sidebar: with the caveat that I have not fully read the trusted types proposal, why is it not possible to implement an equivalent of evalable using just the default policy mechanism, along the lines of

<meta http-equiv="Content-Security-Policy" content="trusted-types default">
<script>
// Application code defines a default policy
{
  let trusted = new Set;
  trusted.has = trusted.has.bind(trusted);
  trusted.add = trusted.add.bind(trusted);
  trusted.delete = trusted.delete.bind(trusted);
  window.trusted = trusted;
  Trusted.createPolicy(
    'default',
    {
      createScript(x) {
        if (!trusted.has(x)) {
          throw new Error();
        }
        trusted.delete(x);
        return x;
      },
    });
}
</script>
<script>
{
  let trusted = window.trusted;
  delete window.trusted; // there are other ways of doing this, like import maps; it's just an example.
  const unpackResponse = (() => {
    return ({ status, responseText }) => {    
      if (status === 200 && ...) {
        // backend.example.com is a known producer of TrustedScript responses
        // since it uses TrustedTypes for server side discipline.
        trusted.add(responseText);
        return (0, eval)(responseText);
      }
      throw new Error(...);
    };
  })();
}
</script>

Default policy is an escape hatch, and we'd rather promote a design that uses regular, named policies, that allow you to decouple trusted value production from consumption. Using types to represent a "trusted" value allows us to write static analysis rules that guide developers and warn them on errors that would break at runtime (like e.g. eval(string) if no default policy exists, or it doesn't convert any value). Additionally, types allow us to group values that might be used in various similar context. For example, TrustedScript is not only used for eval, but for some DOM sinks (like HTMLScriptElement.src) as well.

@bakkot

This comment has been minimized.

Copy link
Author

commented Jun 14, 2019

Why is evalable not useful for 3rd party code?

I was following @mikesamuel's taxonomy above:

  • Legacy applications where the migrators have direct control over the source code.
  • Legacy third-party dependencies which are harder to adjust

evalable is useful for third party code only if that code has been updated to use trusted types and then updated in the first party application. It is not useful for existing third party code, over which an application author has no direct control.

We expect the majority of the code producing that to be 1st party, but there is an irreducible amount of a code that relies on eval out there in popular dependencies. That code might migrate to using TrustedScripts as well

Is there reason to expect these dependencies to migrate to TrustedScripts, and not to migrate off of eval? For example, have you talked to any such popular dependencies (ideally non-Google-controlled ones) about this proposal?

10 out of 16 used eval() in a way that's essential to them

Interesting. I read the paper, but don't see this specific claim in it; can you point me to where I should be looking?

Using types to represent a "trusted" value allows us to write static analysis rules that guide developers and warn them on errors that would break at runtime

Are you planning to build this static analysis into Chrome's devtools? If not, is there reason to expect adoption of such a static analysis outside of Google?


Would it at least be accurate to say that evalable is not useful for any existing code, but the hope is that convincing people to update their code to use trusted types will be easier than convincing them to get rid of eval?

@koto

This comment has been minimized.

Copy link

commented Jun 14, 2019

10 out of 16 used eval() in a way that's essential to them

Interesting. I read the paper, but don't see this specific claim in it; can you point me to where I should be looking?

Small correction here. It would be less than 10, I believe we also ticked the box for frameworks that have non-eval based parsers. I see however that this is still common: Vue.js uses eval in full version , so does Ractive, AngularJS in non-CSP mode, Underscore, all the jQuery-family libraries etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.