Skip to content

.Net: [MEVD] Refactor embedding type resolution to ProviderServices (to avoid duplication in each connector) #12508

Open
@roji

Description

@roji

Throughout all providers, we have code such as the following, which successively tries all the vector types supported by the provider:

if (vectorProperty.TryGenerateEmbedding<TRecord, Embedding<float>>(record, cancellationToken, out var floatTask))
{
	generatedEmbeddings ??= new Dictionary<VectorPropertyModel, IReadOnlyList<Embedding>>(vectorPropertyCount);
	generatedEmbeddings[vectorProperty] = [await floatTask.ConfigureAwait(false)];
}
#if NET8_0_OR_GREATER
else if (vectorProperty.TryGenerateEmbedding<TRecord, Embedding<Half>>(record, cancellationToken, out var halfTask))
{
	generatedEmbeddings ??= new Dictionary<VectorPropertyModel, IReadOnlyList<Embedding>>(vectorPropertyCount);
	generatedEmbeddings[vectorProperty] = [await halfTask.ConfigureAwait(false)];
}
#endif
else
{
	throw new InvalidOperationException(
		$"The embedding generator configured on property '{vectorProperty.ModelName}' cannot produce an embedding of type '{typeof(Embedding<float>).Name}' for the given input type.");
}

This should be factored out to generic logic, ideally in the model, so that providers can be made simpler. This may involve a merging of the current CollectionModel and CollectionModelBuilder types.

Metadata

Metadata

Assignees

Labels

.NETIssue or Pull requests regarding .NET codemsft.ext.vectordataRelated to Microsoft.Extensions.VectorData

Type

Projects

Status

Sprint: Planned

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions