Optional Document Storage? #1023

martinossx · 2025-03-06T14:32:19Z

martinossx
Mar 6, 2025

I have a similar use-case like this where I want to recognize a user's intent. For every intent I have some utterances. Basically I want to replace LUIS with KernelMemory.

Planned setup:

Reuse existing C# API (Microsoft BotFramework), using MemoryServerless
A WebJob will perform the Ingestion using KernelMemory (Azure Storage BlobTrigger), MemoryServerless
Use Azure AI Search as Vector store

For my use case, I only want to find the name of the intent based on a user prompt. Utterances, intent name and vector data is all stored on Azure AI Search after ingestion.

So I asked myself, how does a document store helping me here? Everything I need can be pulled from the Vector store.

I started with SimpleDocumentStorage, but had to enable AllowMixingVolatileAndPersistentData.
Then I configured KM with WithAzureBlobsDocumentStorage but I noticed that this makes the ingestion super slow, even when I import the same set of data again.
I expected KM to at least skip updating intents that has not changed, but that didn't happen..

I guess I get something wrong.
Can someone explain me, why I need a DocumentStorage at all? Can I opt out from it, or is there an option on DocumentStorage that avoid re-importing unchanged data?

dluc · 2025-03-23T07:47:32Z

dluc
Mar 23, 2025
Maintainer

The main reason for document storage is to allow updates. When a document is uploaded, the ID is persisted in document storage, using a folder name. The folder contains information used during the update process. This information would be hard to store in other places.

If documents are never updated and KM runs in a single node, you can use SimpleDocumentStorage and keep data in memory (with the risk of losses in case of reboots though). Considering multiple nodes and reliability, data needs to be centralized somewhere and Azure Blobs is one of the options. It should not be that slow though.

1 reply

martinossx Mar 31, 2025
Author

For testing purposed, I created a simple WebApi project and configured it like this:

memoryBuilder.WithAzureAISearchMemoryDb(new AzureAISearchConfig
    {
        Endpoint = "<my-endpoint>",
        Auth = AzureAISearchConfig.AuthTypes.APIKey,
        APIKey = "<my-api-key>"
    });


// Test with persistent storage
memoryBuilder.WithAzureBlobsDocumentStorage(new AzureBlobsConfig()
{
    Auth = AzureBlobsConfig.AuthTypes.AzureIdentity,
    Account = "<my-storage-account-name>",
    Container = "kernelmemory"
});
var memory = memoryBuilder.Build<MemoryServerless>(new KernelMemoryBuilderBuildOptions());

builder.Services.AddSingleton<MemoryServerless>(memory);

In a controller method, I'm adding entries to the memory:

[HttpPost("memoryadd")]
public async Task TestMemory()
{  
    await _memory.ImportTextAsync(text: "Where can i buy a smartphone?", documentId: "i_buy_electronics");
    await _memory.ImportTextAsync(text: "Who created bitcoin?", documentId: "i_satoshi");
}

In the output window, I can observe that embeddings are are created for each entry. So far so good. Also the storage account is getting populated.

But when I call the "memoryadd" endpoint again (without changing the data), It creates embeddings again and updates the entries in the search DB.

Shouldn't the blobs document storage help avoiding unnecessary model calls and updates? When I add the same intent/text with the same document id again, it still creates the embeddings and updates the entry.

By allowing updates, you mean the benefit of avoiding downtime by wiping and re-creating the index?

I plan to sporadically process a JSON file that contain the latest version of intents and utterances and thought I could keep a few versions (indexes) and maybe work with an alias to select the active one.

Do you think KernelMemory is not the right choice in my scenario?

I thought about creating a custom "NullDocumentStorage" as last resort if opting out of a document storage is not possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optional Document Storage? #1023

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Optional Document Storage? #1023

Uh oh!

martinossx Mar 6, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

dluc Mar 23, 2025 Maintainer

Uh oh!

Uh oh!

martinossx Mar 31, 2025 Author

martinossx
Mar 6, 2025

Replies: 1 comment 1 reply

dluc
Mar 23, 2025
Maintainer

martinossx Mar 31, 2025
Author