Skip to content

Vector datatype support #2634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

muskan124947
Copy link
Contributor

@muskan124947 muskan124947 commented Mar 18, 2025

Description
With the rise of AI and machine learning, the ability to handle vector data is crucial for applications like semantic search, recommendation systems, and more. By incorporating vector support directly into Microsoft SQL Server and Azure SQL Database, we eliminate the need for separate vector databases, streamlining data architecture and improving performance.
The vector data type is designed to store vector data optimized for operations such as similarity search and machine learning applications.

Vector support in Microsoft SQL Server JDBC driver is enabled through a new feature extension (FE identifier 0x0E) with a version handshake to negotiate vector capabilities with the server.

Since the standard JDBC specification does not include native support for the VECTOR type, the driver provides a custom implementation using the microsoft.sql.Vector class.

The vector object includes number of elements, a type indicator, and the actual data, as defined below:

public enum VectorDimensionType { 
    float32 // 32-bit (single precision) float 
}
private VectorDimensionType vectorType; 
private int dimensionCount; 
private Object[] data;

This enhancement introduces support for a number of VECTOR data type operations, including inserts, selections, stored procedures, bulk inserts with useBulkCopyForBatchInsert, table-to-table bulk copies, CSV-to-table bulk copies, and table-valued parameters (TVP).

Below are some of the example scenarios:

  1. Insertion of vector data into table
@Test 
void insertVectorData() throws SQLException { 
        String insertSql = "INSERT INTO " + AbstractSQLGenerator.escapeIdentifier(tableName) + " (v) VALUES (?, ?)"; 
        Object[] data= new Float[] { 0.45f, 7.9f, 63.0f }; 
        Vector vector = new Vector(3, VectorDimensionType.float32, data); 
        //Vector vector = new Vector(3, 4, data); 
       //Vector nullVector = new Vector(3,4,null);
        try (PreparedStatement pstmt = connection.prepareStatement(insertSql))         { 
            pstmt.setObject(1, vector, microsoft.sql.Types.VECTOR);
            pstmt.executeUpdate(); 
        } 
}
  1. Select vector data from table
@Test
    public void getVectorData() throws SQLException {
        String query = "SELECT v FROM " + AbstractSQLGenerator.escapeIdentifier(tableName);
        try (PreparedStatement stmt = connection.prepareStatement(query)) {
            try (ResultSet rs = stmt.executeQuery()) {
                assertTrue(rs.next(), "No result found for inserted vector.");
                ResultSetMetaData meta = rs.getMetaData();
                int columnCount = meta.getColumnCount();

                while (rs.next()) {
                    for (int i = 1; i <= columnCount; i++) {
                        String columnName = meta.getColumnName(i);
                        int columnType = meta.getColumnType(i); // from java.sql.Types

                        Object value = null;
                        switch (columnType) {
                            case Types.VARCHAR:
                            case Types.NVARCHAR:
                                value = rs.getString(i);
                                break;
                            case microsoft.sql.Types.VECTOR:
                                value = rs.getObject(i, microsoft.sql.Vector.class);
                        }

                        System.out.println(columnName + " = " + value + " (type: " + columnType + ")");
                    }
                    System.out.println("---");
                }
            }
        }
    }
  1. Stored Procedure Call
@Test 
public void testVectorStoredProcedureInputOutput() throws SQLException { 
        createProcedure(); 
        String call = "{call " + AbstractSQLGenerator.escapeIdentifier(procedureName) + "(?, ?)}"; 
        try (SQLServerCallableStatement cstmt = (SQLServerCallableStatement) connection.prepareCall(call)) { 
            Vector vector= new Vector(3, VectorDimensionType.float32, new Float[]{0.5f, 1.0f, 1.5f}); 
            cstmt.setObject(1, vector, microsoft.sql.Types.VECTOR); 
            cstmt.registerOutParameter(2, microsoft.sql.Types.VECTOR, 3, 4);
            cstmt.execute(); 
        } 
   } 
  1. Bulk Copy from CSV to table
@Test
    public void testBulkCopyVectorFromCSV() throws SQLException { 
        String dstTable = RandomUtil.getIdentifier("dstTable"); 
        String fileName = filePath + vectorInputCsvFile; 
        try (Connection con = getConnection(); 
                Statement stmt = con.createStatement(); 
                SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(con); 
                SQLServerBulkCSVFileRecord fileRecord = new SQLServerBulkCSVFileRecord(fileName, null, ",", true)) { 
            // Create the destination table 
            stmt.executeUpdate( 
                    "CREATE TABLE " + dstTable + " (id INT, vectorCol VECTOR(3));"); 
            fileRecord.addColumnMetadata(1, "vectoxrCol", microsoft.sql.Types.VECTOR, 3, 4);
            fileRecord.setEscapeColumnDelimitersCSV(true); 
            bulkCopy.setDestinationTableName(dstTable); 
            bulkCopy.writeToServer(fileRecord); 
        } 
} 

Backward Compatibility
If an application hasn't been updated to handle the VECTOR data type, the driver provides backward compatibility by allowing vector data types to be read using backward compatible types. This is controlled using the vectorTypeSupport connection string property.
Currently, the supported values are "off" (server sends vector types as JSON string data) and "v1" (server sends vector types of float32 as vector data). Default is "v1". Future values may be added as needed to support additional vector types. ("v2" may indicate additional support for float16 and/or int32 vectors, for example).

Testing

  1. Added test scenarios for all operations involving the VECTOR data type. Tests are implemented across the following classes: VectorTest, BulkCopyCSVTest, BulkCopyISQLServerBulkRecordTest, DatabaseMetaDataTest and BatchExecutionWithBulkCopyTest.
  2. Vector-specific tests are tagged using the vectorTest exclusion tag to allow controlled execution.
  3. All tests have been verified locally using a SQL Server build with vector support.
    Currently, tests tagged with vectorTest are excluded from ADO runs. These will be enabled post the official build rollout.

Performance
Conducted performance benchmarking for vector data insertion and retrieval across different data volumes— 100, 1K, 10K, 100K, and 1M records.
Comparative analysis was performed locally to evaluate performance across various scenarios.

📊 Performance Metrics

(dimensionCount = 1998; includes serialization and deserialization time)

Number of Records Insert Operation (ms) Insert with useBulkCopyForBatchInsert (ms) Bulk Copy Operation (ms)
1 11 202 249
100 215 243 275
1,000 991 592 598
10,000 6,089 1,881 2,026
100,000 52,126 14,226 14,502
1,000,000 492,570 143,871 153,397

Copy link

codecov bot commented Mar 18, 2025

Codecov Report

Attention: Patch coverage is 22.53521% with 110 lines in your changes missing coverage. Please review.

Project coverage is 51.40%. Comparing base (a0ea23a) to head (7ede77d).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/main/java/microsoft/sql/Vector.java 0.00% 43 Missing ⚠️
...oft/sqlserver/jdbc/SQLServerPreparedStatement.java 4.54% 19 Missing and 2 partials ⚠️
...rc/main/java/com/microsoft/sqlserver/jdbc/dtv.java 5.55% 16 Missing and 1 partial ⚠️
...m/microsoft/sqlserver/jdbc/SQLServerResultSet.java 0.00% 11 Missing and 1 partial ⚠️
...in/java/com/microsoft/sqlserver/jdbc/IOBuffer.java 0.00% 7 Missing ⚠️
.../microsoft/sqlserver/jdbc/SQLServerConnection.java 75.00% 5 Missing ⚠️
...n/java/com/microsoft/sqlserver/jdbc/DataTypes.java 83.33% 3 Missing ⚠️
...n/java/com/microsoft/sqlserver/jdbc/Parameter.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2634      +/-   ##
============================================
- Coverage     51.59%   51.40%   -0.20%     
+ Complexity     3999     3994       -5     
============================================
  Files           147      148       +1     
  Lines         33706    33841     +135     
  Branches       5631     5648      +17     
============================================
+ Hits          17391    17396       +5     
- Misses        13866    13991     +125     
- Partials       2449     2454       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@muskan124947 muskan124947 requested a review from David-Engel May 5, 2025 03:52
@machavan machavan added this to the 12.11.0 milestone May 12, 2025
Copy link
Contributor

@machavan machavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting first pass of review comments

@Ananya2 Ananya2 force-pushed the user/muskan/vector-datatype-support branch from 110d79e to 8de3aa5 Compare May 28, 2025 16:49
Copy link
Collaborator

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review (about 1/3 of files).

Copy link
Collaborator

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another partial review. 17/35 files.

Copy link
Collaborator

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. 24/35 files.

@muskan124947 muskan124947 force-pushed the user/muskan/vector-datatype-support branch from eddf8ff to 1167b6b Compare June 11, 2025 05:44
try (Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery(query)) {

ResultSetMetaData metaData = resultSet.getMetaData();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us also add test for Connection.getDatabaseMetadata (that internally queries sp_columns_100)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, sp_columns_100 displays column type as varbinary for VECTOR column

@David-Engel
Copy link
Collaborator

My approval isn't needed to merge this PR, at this point. I think the surface looks good. None of the small issues I identified are blockers. Just make sure the rest of the team is good with it and that CI is green.

Thanks!

David

muskan124947 and others added 28 commits June 16, 2025 15:47
…peration is performed for mismatched source and destination tables
…de, added functions to getScale, bytesperDimension and updated the logic to use functions instead of handling hardcoded values
…dated toString() for Vector to return json formatted string
…erialization instead of float[] and updated the test cases.
…isterOutParameter, getObject, and setObject APIs.
@muskan124947 muskan124947 force-pushed the user/muskan/vector-datatype-support branch from f413160 to 10e0f8e Compare June 16, 2025 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

5 participants