-
Notifications
You must be signed in to change notification settings - Fork 437
Vector datatype support #2634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Vector datatype support #2634
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2634 +/- ##
============================================
- Coverage 51.59% 51.40% -0.20%
+ Complexity 3999 3994 -5
============================================
Files 147 148 +1
Lines 33706 33841 +135
Branches 5631 5648 +17
============================================
+ Hits 17391 17396 +5
- Misses 13866 13991 +125
- Partials 2449 2454 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerDataTable.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting first pass of review comments
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerBulkCopy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerCallableStatement.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerDataTable.java
Outdated
Show resolved
Hide resolved
110d79e
to
8de3aa5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review (about 1/3 of files).
src/main/java/com/microsoft/sqlserver/jdbc/ISQLServerConnection.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/ISQLServerConnection.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/ISQLServerDataSource.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/ISQLServerDataSource.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another partial review. 17/35 files.
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnectionPoolProxy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerDataTable.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review. 24/35 files.
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerResource.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerResource.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerResource.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerResultSet.java
Outdated
Show resolved
Hide resolved
eddf8ff
to
1167b6b
Compare
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java
Outdated
Show resolved
Hide resolved
src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java
Outdated
Show resolved
Hide resolved
try (Statement statement = connection.createStatement(); | ||
ResultSet resultSet = statement.executeQuery(query)) { | ||
|
||
ResultSetMetaData metaData = resultSet.getMetaData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us also add test for Connection.getDatabaseMetadata (that internally queries sp_columns_100)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, sp_columns_100 displays column type as varbinary for VECTOR column
src/test/java/com/microsoft/sqlserver/jdbc/bulkCopy/BulkCopyISQLServerBulkRecordTest.java
Outdated
Show resolved
Hide resolved
My approval isn't needed to merge this PR, at this point. I think the surface looks good. None of the small issues I identified are blockers. Just make sure the rest of the team is good with it and that CI is green. Thanks! David |
…o decode/encode data, along with additional features like dimensionType, and dimensionCount.
…roved constructor parameter positioning for clarity, removed VectorInputStream
…nsionCount for vector, fixed datatype and tdsType mappings along with setting correct typeDefinition in Parameter class.
…peration is performed for mismatched source and destination tables
…de, added functions to getScale, bytesperDimension and updated the logic to use functions instead of handling hardcoded values
…nd scale value also
…dated toString() for Vector to return json formatted string
…erialization instead of float[] and updated the test cases.
…isterOutParameter, getObject, and setObject APIs.
…uced in SQL Server 2025 onwards
f413160
to
10e0f8e
Compare
Description
With the rise of AI and machine learning, the ability to handle vector data is crucial for applications like semantic search, recommendation systems, and more. By incorporating vector support directly into Microsoft SQL Server and Azure SQL Database, we eliminate the need for separate vector databases, streamlining data architecture and improving performance.
The vector data type is designed to store vector data optimized for operations such as similarity search and machine learning applications.
Vector support in Microsoft SQL Server JDBC driver is enabled through a new feature extension (FE identifier 0x0E) with a version handshake to negotiate vector capabilities with the server.
Since the standard JDBC specification does not include native support for the VECTOR type, the driver provides a custom implementation using the
microsoft.sql.Vector
class.The vector object includes number of elements, a type indicator, and the actual data, as defined below:
This enhancement introduces support for a number of VECTOR data type operations, including inserts, selections, stored procedures, bulk inserts with
useBulkCopyForBatchInsert
, table-to-table bulk copies, CSV-to-table bulk copies, and table-valued parameters (TVP).Below are some of the example scenarios:
Backward Compatibility
If an application hasn't been updated to handle the VECTOR data type, the driver provides backward compatibility by allowing vector data types to be read using backward compatible types. This is controlled using the vectorTypeSupport connection string property.
Currently, the supported values are "off" (server sends vector types as JSON string data) and "v1" (server sends vector types of float32 as vector data). Default is "v1". Future values may be added as needed to support additional vector types. ("v2" may indicate additional support for float16 and/or int32 vectors, for example).
Testing
Currently, tests tagged with vectorTest are excluded from ADO runs. These will be enabled post the official build rollout.
Performance
Conducted performance benchmarking for vector data insertion and retrieval across different data volumes— 100, 1K, 10K, 100K, and 1M records.
Comparative analysis was performed locally to evaluate performance across various scenarios.
📊 Performance Metrics
(dimensionCount = 1998; includes serialization and deserialization time)
useBulkCopyForBatchInsert
(ms)