Allowing model choice tradeoffs, e.g., speed vs. quality vs. context length

(As with many issues, this also applies to https://github.com/webmachinelearning/prompt-api, and maybe even https://github.com/webmachinelearning/translation-api, but I'll open it here.)

In Chrome we're exploring some situations where two different foundation models might be available:

- Model 1: worse quality responses, smaller context window, faster to produce response
- Model 2: better quality responses, larger context window, slower to produce responses

Should we allow web developers to choose between these? And if so, how?

This is particularly hard because we'd want to design an API that is flexible enough to work with more than just the specific two-model setup above. E.g., some browsers might always have one model. Or some browsers might have models with different tradeoffs, e.g. maybe context length is the same but it's a matter of quality vs. speed. Or maybe it's something like coding ability vs. summarization ability vs. general knowledge of facts!

We've had a few possible ideas, which I'll list here, but none of them are fully satisfactory:

- Make this entirely the browser's responsibility, hiding the choice from the web developer. E.g., on low-end hardware, Chrome might choose to use model 1, and on high-end hardware, model 2.
- Allow some way of specifying preferred models by name (cf. https://github.com/webmachinelearning/prompt-api/issues/3, https://github.com/webmachinelearning/prompt-api/issues/8). This is problematic for many reasons, notably hurting interoperability, so it's not a very good option.
- Allow some way of specifying expected or hoped-for context length, in characters or tokens. E.g. `{ expectedInputQuota: 10_000 }` might give model 2, whereas `{ expectedInputQuota: 1_000 }` might give model 1. (This is similar to the `requiredTokensPerSecond` idea from https://github.com/webmachinelearning/writing-assistance-apis/issues/77.)
- Provide a series of vague enum hints that can be used to guide model selection, e.g. `{ selectionCriteria: ["prefer-speed"] }`. Other values could be things like `"prefer-large-input-quota"`, `"prefer-world-knowledge"`, etc.

The final idea seems closest to workable to me, as it still leaves a lot in the hands of the browser, but gives a maybe-good-enough signal for the browser to work with.

You could even use the order of the selection criteria to tiebreak, so e.g. if a web developer lists `["prefer-speed", "prefer-large-input-quota"]`, Chrome would pick model 1, whereas if they list `["prefer-large-input-quota", "prefer-speed"]`, Chrome would pick model 2. (And in both cases, Chrome could give a console warning about how the combination cannot be satisfied, so it prioritized satisfying the first criteria.)

However, it gets a bit messy if an even larger inventory of models becomes available, e.g. if Chrome were to introduce a model 3 with even-better quality, even-larger context window, and even-slower responses. Would `"prefer-large-input-quota"` suddenly switch to model 3? Or should there be a `"prefer-even-larger-input-quota"` to capture the difference? Nothing is great here.

For now we don't have any immediate plans in this direction, but we wanted to open this issue to reflect what we've discussed publicly and gather feedback from developers and other implementers on the problem space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allowing model choice tradeoffs, e.g., speed vs. quality vs. context length #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allowing model choice tradeoffs, e.g., speed vs. quality vs. context length #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions