Phi Silica text completions are really slow

While embedding performance is OK (about 2,000 tokens/second), text completions are crazy slow: **5 seconds for roughly 250 prompt tokens and 20 completion tokens.**

I don't think my use case is too crazy -- it's a subset of a much larger AI problem (measured in tokens & desired intelligence) we are effectively solving with GPT 4o mini today.

The answer is borderline workable and will need lots of prompt engineering. But even if it nailed it, for this to be workable in a UI and to choose Phi Silica over cloud models for this specific use case, I need it under 1 second -- ideally under 500ms.

Are these chat completion token counts in the ballpark for what this feature is designed? Am I missing some key configuration, a model selector, or something? What can I expect out of Phi Silica?

For reference, my code looks roughly like:

```csharp
using Microsoft.Windows.AI.Generative;

if (!LanguageModel.IsAvailable()) await LanguageModel.MakeAvailableAsync();

using LanguageModel languageModel = await LanguageModel.CreateAsync();

var prompt = "...";

var options = new LanguageModelOptions();
options.Temp = ...;
options.Top_p = ...;

// I run the next line over 10 to 100 iterations, and average the iteration time:
var result = await languageModel.GenerateResponseAsync(options, prompt);
```

My setup is:

- Dell Latitude 7455 with Snapdragon X Elite (X1E80100)
- Windows 26120.3863
- Microsoft.WindowsAppSDK v1.7.250127003-experimental3
- .NET 8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phi Silica text completions are really slow #5334

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phi Silica text completions are really slow #5334

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions