-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: IVectorStore implementation for Azure SQL #10623
.Net: IVectorStore implementation for Azure SQL #10623
Conversation
This reverts commit 186fda2.
…o sqlServerTests # Conflicts: # dotnet/SK-dotnet.sln
…t throws NotImplementedException
…ngs that need to be addressed
The ToListAsync change is just a wokaround dotnet/runtime#79782 (comment)
dotnet/src/Connectors/Connectors.Memory.Common/SqlFilterTranslator.cs
Outdated
Show resolved
Hide resolved
dotnet/src/InternalUtilities/src/Data/VectorStoreRecordPropertyReader.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.SqlServer/SqlServerCommandBuilder.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.SqlServer/SqlServerCommandBuilder.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.SqlServer/SqlServerCommandBuilder.cs
Outdated
Show resolved
Hide resolved
|
||
namespace SqlServerIntegrationTests.VectorSearch; | ||
|
||
public class SqlServerBasicVectorSearchTests_Hnsw(SqlServerFixture fixture) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this now with Hsnw being unsupported after all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to make sure it's actually covered with tests (so we keep throwing for non-supported filter names)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't they be covered with the other SqlServerBasicVectorSearchTests implementation anyway, i.e. the one using FLAT. Although I assume both are using FLAT since HNSW isn't supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SqlServerVectorSearchDistanceFunctionComplianceTests
file does not set the index name, so it's using the default.
The _Hnsw
does provide the index name and the tests ensure that we throw NotSupportedException
on purpose.
dotnet/src/Connectors/VectorData.Abstractions/RecordDefinition/VectorStoreRecordKeyProperty.cs
Outdated
Show resolved
Hide resolved
{ | ||
await this.ExecuteAsync(async collection => | ||
{ | ||
Assert.Empty(await collection.UpsertBatchAsync([]).ToArrayAsync()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, during our discussion yesterday I misunderstood when you asked whether we should throw if we are upsetting with an empty collection. I thought you were referring to the storage collection, not the collection being passed into the method. Hence my confusion as to why this would be a problem 😄
That said, I don't have a strong opinion about whether to throw in this case. I tend to lean towards throwing, since it could help the user find bugs, but it's not a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roji we have agreed offline to keep it as is (don't throw) and wait until you are back since you most likely have designed a similar API in the past and have opinion about it
dotnet/src/VectorDataIntegrationTests/VectorDataIntegrationTests/CRUD/BasicConformanceTests.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.SqlServer/SqlServerVectorStoreRecordCollection.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.SqlServer/SqlServerVectorStoreRecordCollection.cs
Show resolved
Hide resolved
- add support for VectorStoreGenericDataModel - respect IncludeVectors in Get* methods - rename BasicVectorSearchTests to VectorSearchDistanceFunctionComplianceTests
…/VectorStoreRecordKeyProperty.cs Co-authored-by: westey <164392973+westey-m@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamsitnik great work, it's really nice to see this. Take a look at all the comments.
There's a lot that I think we could improve, but we also shouldn't get blocked too much on this. I do have a bit of a problem with introducing AutoGenerated as a public API into the abstraction (or at least want us to discuss this first). A lot of the rest is easy/simple, and I'm OK with deferring most of it to a future PR so we can release soon.
dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresFilterTranslator.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresFilterTranslator.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.Common/SqlFilterTranslator.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.Common/SqlFilterTranslator.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.Common/SqlFilterTranslator.cs
Outdated
Show resolved
Hide resolved
dotnet/src/VectorDataIntegrationTests/SqlServerIntegrationTests/SqlServerCommandBuilderTests.cs
Show resolved
Hide resolved
...src/VectorDataIntegrationTests/SqlServerIntegrationTests/Filter/SqlServerBasicFilterTests.cs
Outdated
Show resolved
Hide resolved
...src/VectorDataIntegrationTests/SqlServerIntegrationTests/Filter/SqlServerBasicFilterTests.cs
Outdated
Show resolved
Hide resolved
{ | ||
protected VectorStoreFixture Fixture { get; } = fixture; | ||
|
||
protected virtual string GetUniqueCollectionName() => Guid.NewGuid().ToString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some databases, creating collections is a heavy operation, so we should generally try to avoid creating a collection for each test unless it's necessary. This is why the filter tests have a single collection, and all tests reuse it. I think that should be possible for the new tests being introduced here.
var collection = this.Fixture.TestStore.DefaultVectorStore.GetCollection<TKey, TRecord>(collectionName, | ||
this.GetRecordDefinition()); | ||
|
||
await collection.CreateCollectionAsync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, creating a collection is a bit more complicated - in some databases this happens asynchronously, so CreateCollectionAsync returns before the collection is actually ready, and the test will fail.
See how this is done in BasicfilterTests - I've tried to design this in a way which is reusable across test suites (e.g. WaitForDataAsync).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I was not aware of that. Is it OK to fix it later, when we actually add some tests for such DBs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. I really hope we'll be able to invest a bit in the integration test suite/infrastructure before the release to ensure we cover behavior properly across connectors.
Co-authored-by: Shay Rojansky <roji@roji.org>
…o sqlServerNew # Conflicts: # dotnet/SK-dotnet.sln # dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresVectorStoreCollectionSqlBuilder.cs # dotnet/src/Connectors/Connectors.Memory.Sqlite/SqliteVectorStoreRecordCollection.cs
- remove NONCLUSTERED used PRIMARY KEY - remove COLLATE Latin1_General_100_BIN2 for VARCHAR columns - remove TimeSpan => TIME mapping - add support for TimeOnly (TIME) - make Schema optional - handle Where(x => x.Bool) - use square brackets for escaping column names in WHERE clause
…o sqlServerNew # Conflicts: # dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresVectorStoreRecordCollection.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the comments below have been addressed, thanks @adamsitnik!
this._sql.Append('\'').Append(s.Replace("'", "''")).Append('\''); | ||
return; | ||
case bool b: | ||
this._sql.Append(b ? "TRUE" : "FALSE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the representation of bool varies considerably across databases, this probably shouldn't be here (only in the overrides of this method in connector subtypes).
Type t when t == typeof(bool) => ("BIT", null), | ||
Type t when t == typeof(DateTime) => ("DATETIME2", null), | ||
#if NET | ||
Type t when t == typeof(TimeOnly) => ("TIME", null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need DateOnly here, mapped to DATE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at implementing DateOnly
and I don't see a way to easily do that as of now. The reason is that the SqlDataReader
maps such an output as DateTime
and reader.GetProviderSpecificFieldType(this._sqlDataReader.GetOrdinal(storageName))
also returns DateTime
and we also support DATETIME2
which does the same.
For TIME
it was similar, as it also returns TimeSpan
rather than TimeOnly
. But we don't support TimeSpan
, so we can just map all TimeSpan
instances into TimeOnly
.
I know I could give the reader the property infos or the definitions to deal with that, I just don't believe it's worth it as of now, especially if we want to get rid of this layer (the need to represent SqlDataReader
as IDictionary<string, object?>
).
dotnet/src/Connectors/Connectors.Memory.SqlServer/SqlServerConstants.cs
Outdated
Show resolved
Hide resolved
#if NET | ||
// We don't support mapping TimeSpan to TIME on purpose | ||
// See https://github.com/microsoft/semantic-kernel/pull/10623#discussion_r1980350721 | ||
typeof(TimeOnly), // TIME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also DateOnly
dotnet/src/Connectors/Connectors.Memory.SqlServer/SqlServerFilterTranslator.cs
Outdated
Show resolved
Hide resolved
/// <summary> | ||
/// Gets or sets the database schema. | ||
/// </summary> | ||
public string? Schema { get; init; } = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am noting that in PG and Sqlite this isn't nullable, but rather initialized to the default; as discussed before, I think this means that all SQLs always contain the schema (e.g. public
for PG), whereas we should just omit it like we do here.
@adamsitnik do you mind applying this change (turn Schema to nullable) for the other relation providers as well? Definitely doesn't have to be in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I've logged an issue to make sure I don't forget to do that: #10853
dotnet/src/InternalUtilities/src/Data/VectorStoreRecordPropertyReader.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Shay Rojansky <roji@roji.org>
7aa7d29
into
microsoft:feature-vector-data-preb1
The PR is ready for review.
fixes #10416