Skip to content

Allowing model choice tradeoffs, e.g., speed vs. quality vs. context length #81

@domenic

Description

@domenic

(As with many issues, this also applies to https://github.com/webmachinelearning/prompt-api, and maybe even https://github.com/webmachinelearning/translation-api, but I'll open it here.)

In Chrome we're exploring some situations where two different foundation models might be available:

  • Model 1: worse quality responses, smaller context window, faster to produce response
  • Model 2: better quality responses, larger context window, slower to produce responses

Should we allow web developers to choose between these? And if so, how?

This is particularly hard because we'd want to design an API that is flexible enough to work with more than just the specific two-model setup above. E.g., some browsers might always have one model. Or some browsers might have models with different tradeoffs, e.g. maybe context length is the same but it's a matter of quality vs. speed. Or maybe it's something like coding ability vs. summarization ability vs. general knowledge of facts!

We've had a few possible ideas, which I'll list here, but none of them are fully satisfactory:

  • Make this entirely the browser's responsibility, hiding the choice from the web developer. E.g., on low-end hardware, Chrome might choose to use model 1, and on high-end hardware, model 2.
  • Allow some way of specifying preferred models by name (cf. Exposing a model ID or similar prompt-api#3, Choose model prompt-api#8). This is problematic for many reasons, notably hurting interoperability, so it's not a very good option.
  • Allow some way of specifying expected or hoped-for context length, in characters or tokens. E.g. { expectedInputQuota: 10_000 } might give model 2, whereas { expectedInputQuota: 1_000 } might give model 1. (This is similar to the requiredTokensPerSecond idea from API to restrict devices for which the API would be too slow? #77.)
  • Provide a series of vague enum hints that can be used to guide model selection, e.g. { selectionCriteria: ["prefer-speed"] }. Other values could be things like "prefer-large-input-quota", "prefer-world-knowledge", etc.

The final idea seems closest to workable to me, as it still leaves a lot in the hands of the browser, but gives a maybe-good-enough signal for the browser to work with.

You could even use the order of the selection criteria to tiebreak, so e.g. if a web developer lists ["prefer-speed", "prefer-large-input-quota"], Chrome would pick model 1, whereas if they list ["prefer-large-input-quota", "prefer-speed"], Chrome would pick model 2. (And in both cases, Chrome could give a console warning about how the combination cannot be satisfied, so it prioritized satisfying the first criteria.)

However, it gets a bit messy if an even larger inventory of models becomes available, e.g. if Chrome were to introduce a model 3 with even-better quality, even-larger context window, and even-slower responses. Would "prefer-large-input-quota" suddenly switch to model 3? Or should there be a "prefer-even-larger-input-quota" to capture the difference? Nothing is great here.

For now we don't have any immediate plans in this direction, but we wanted to open this issue to reflect what we've discussed publicly and gather feedback from developers and other implementers on the problem space.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions