Skip to content

Vector datatype support #2634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Jun 30, 2025
Merged

Conversation

muskan124947
Copy link
Contributor

@muskan124947 muskan124947 commented Mar 18, 2025

Description
With the rise of AI and machine learning, the ability to handle vector data is crucial for applications like semantic search, recommendation systems, and more. By incorporating vector support directly into Microsoft SQL Server and Azure SQL Database, we eliminate the need for separate vector databases, streamlining data architecture and improving performance.
The vector data type is designed to store vector data optimized for operations such as similarity search and machine learning applications.

Vector support in Microsoft SQL Server JDBC driver is enabled through a new feature extension (FE identifier 0x0E) with a version handshake to negotiate vector capabilities with the server.

Since the standard JDBC specification does not include native support for the VECTOR type, the driver provides a custom implementation using the microsoft.sql.Vector class.

The vector object includes number of elements, a type indicator, and the actual data, as defined below:

public enum VectorDimensionType { 
    FLOAT32 // 32-bit (single precision) float 
}
private VectorDimensionType vectorType; 
private int dimensionCount; 
private Object[] data;

This enhancement introduces support for a number of VECTOR data type operations, including inserts, selections, stored procedures, bulk inserts with useBulkCopyForBatchInsert, table-to-table bulk copies, CSV-to-table bulk copies, and table-valued parameters (TVP).

Below are some of the example scenarios:

  1. Insertion of vector data into table
@Test 
void insertVectorData() throws SQLException { 
        String insertSql = "INSERT INTO " + AbstractSQLGenerator.escapeIdentifier(tableName) + " (v) VALUES (?, ?)"; 
        Object[] data= new Float[] { 0.45f, 7.9f, 63.0f }; 
        Vector vector = new Vector(3, VectorDimensionType.FLOAT32, data); 
        //Vector vector = new Vector(3, 4, data); 
       //Vector nullVector = new Vector(3,4,null);
        try (PreparedStatement pstmt = connection.prepareStatement(insertSql))         { 
            pstmt.setObject(1, vector, microsoft.sql.Types.VECTOR);
            pstmt.executeUpdate(); 
        } 
}
  1. Select vector data from table
@Test
    public void getVectorData() throws SQLException {
        String query = "SELECT v FROM " + AbstractSQLGenerator.escapeIdentifier(tableName);
        try (PreparedStatement stmt = connection.prepareStatement(query)) {
            try (ResultSet rs = stmt.executeQuery()) {
                assertTrue(rs.next(), "No result found for inserted vector.");
                ResultSetMetaData meta = rs.getMetaData();
                int columnCount = meta.getColumnCount();

                while (rs.next()) {
                    for (int i = 1; i <= columnCount; i++) {
                        String columnName = meta.getColumnName(i);
                        int columnType = meta.getColumnType(i); // from java.sql.Types

                        Object value = null;
                        switch (columnType) {
                            case Types.VARCHAR:
                            case Types.NVARCHAR:
                                value = rs.getString(i);
                                break;
                            case microsoft.sql.Types.VECTOR:
                                value = rs.getObject(i, microsoft.sql.Vector.class);
                        }

                        System.out.println(columnName + " = " + value + " (type: " + columnType + ")");
                    }
                    System.out.println("---");
                }
            }
        }
    }
  1. Stored Procedure Call
@Test 
public void testVectorStoredProcedureInputOutput() throws SQLException { 
        createProcedure(); 
        String call = "{call " + AbstractSQLGenerator.escapeIdentifier(procedureName) + "(?, ?)}"; 
        try (SQLServerCallableStatement cstmt = (SQLServerCallableStatement) connection.prepareCall(call)) { 
            Vector vector= new Vector(3, VectorDimensionType.FLOAT32, new Float[]{0.5f, 1.0f, 1.5f}); 
            cstmt.setObject(1, vector, microsoft.sql.Types.VECTOR); 
            cstmt.registerOutParameter(2, microsoft.sql.Types.VECTOR, 3, 4);
            cstmt.execute(); 
        } 
   } 
  1. Bulk Copy from CSV to table
@Test
    public void testBulkCopyVectorFromCSV() throws SQLException { 
        String dstTable = RandomUtil.getIdentifier("dstTable"); 
        String fileName = filePath + vectorInputCsvFile; 
        try (Connection con = getConnection(); 
                Statement stmt = con.createStatement(); 
                SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(con); 
                SQLServerBulkCSVFileRecord fileRecord = new SQLServerBulkCSVFileRecord(fileName, null, ",", true)) { 
            // Create the destination table 
            stmt.executeUpdate( 
                    "CREATE TABLE " + dstTable + " (id INT, vectorCol VECTOR(3));"); 
            fileRecord.addColumnMetadata(1, "vectoxrCol", microsoft.sql.Types.VECTOR, 3, 4);
            fileRecord.setEscapeColumnDelimitersCSV(true); 
            bulkCopy.setDestinationTableName(dstTable); 
            bulkCopy.writeToServer(fileRecord); 
        } 
} 

Backward Compatibility
If an application hasn't been updated to handle the VECTOR data type, the driver provides backward compatibility by allowing vector data types to be read using backward compatible types. This is controlled using the vectorTypeSupport connection string property.
Currently, the supported values are "off" (server sends vector types as JSON string data) and "v1" (server sends vector types of FLOAT32 as vector data). Default is "v1". Future values may be added as needed to support additional vector types. ("v2" may indicate additional support for FLOAT16 and/or INT32 vectors, for example).

Testing

  1. Added test scenarios for all operations involving the VECTOR data type. Tests are implemented across the following classes: VectorTest, BulkCopyCSVTest, BulkCopyISQLServerBulkRecordTest, DatabaseMetaDataTest and BatchExecutionWithBulkCopyTest.
  2. Vector-specific tests are tagged using the vectorTest exclusion tag to allow controlled execution.
  3. All tests have been verified locally using a SQL Server build with vector support.
  4. Code coverage will increase once vector related test cases are enabled.
    Note: Currently, tests tagged with vectorTest are excluded from ADO runs and Github checks. These will be enabled post the official build rollout.

Performance
Conducted performance benchmarking for vector data insertion and retrieval across different data volumes— 100, 1K, 10K, 100K, and 1M records.
Comparative analysis was performed locally to evaluate performance across various scenarios.

📊 Performance Metrics

(dimensionCount = 1998; includes serialization and deserialization time)

Number of Records Insert Operation (ms) Insert with useBulkCopyForBatchInsert (ms) Bulk Copy Operation (ms)
1 11 202 249
100 215 243 275
1,000 991 592 598
10,000 6,089 1,881 2,026
100,000 52,126 14,226 14,502
1,000,000 492,570 143,871 153,397

Copy link

codecov bot commented Mar 18, 2025

Codecov Report

Attention: Patch coverage is 14.50617% with 277 lines in your changes missing coverage. Please review.

Project coverage is 51.45%. Comparing base (6c85e2c) to head (5aa8b62).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...java/com/microsoft/sqlserver/jdbc/VectorUtils.java 0.00% 62 Missing ⚠️
src/main/java/microsoft/sql/Vector.java 0.00% 33 Missing ⚠️
...in/java/com/microsoft/sqlserver/jdbc/IOBuffer.java 0.00% 26 Missing ⚠️
...om/microsoft/sqlserver/jdbc/SQLServerBulkCopy.java 0.00% 23 Missing ⚠️
.../microsoft/sqlserver/jdbc/SQLServerConnection.java 47.61% 18 Missing and 4 partials ⚠️
...oft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java 0.00% 19 Missing ⚠️
...oft/sqlserver/jdbc/SQLServerCallableStatement.java 13.63% 17 Missing and 2 partials ⚠️
...rc/main/java/com/microsoft/sqlserver/jdbc/dtv.java 6.25% 15 Missing ⚠️
...m/microsoft/sqlserver/jdbc/SQLServerDataTable.java 0.00% 13 Missing ⚠️
...oft/sqlserver/jdbc/SQLServerPreparedStatement.java 13.33% 12 Missing and 1 partial ⚠️
... and 7 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2634      +/-   ##
============================================
- Coverage     51.98%   51.45%   -0.53%     
+ Complexity     4084     4046      -38     
============================================
  Files           147      149       +2     
  Lines         33828    34136     +308     
  Branches       5655     5700      +45     
============================================
- Hits          17584    17566      -18     
- Misses        13817    14089     +272     
- Partials       2427     2481      +54     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@muskan124947 muskan124947 requested a review from David-Engel May 5, 2025 03:52
@machavan machavan added this to the 12.11.0 milestone May 12, 2025
Copy link
Contributor

@machavan machavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting first pass of review comments

@Ananya2 Ananya2 force-pushed the user/muskan/vector-datatype-support branch from 110d79e to 8de3aa5 Compare May 28, 2025 16:49
Copy link
Collaborator

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review (about 1/3 of files).

Copy link
Collaborator

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another partial review. 17/35 files.

Copy link
Collaborator

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. 24/35 files.

@muskan124947 muskan124947 force-pushed the user/muskan/vector-datatype-support branch from eddf8ff to 1167b6b Compare June 11, 2025 05:44
@David-Engel
Copy link
Collaborator

My approval isn't needed to merge this PR, at this point. I think the surface looks good. None of the small issues I identified are blockers. Just make sure the rest of the team is good with it and that CI is green.

Thanks!

David

@muskan124947 muskan124947 force-pushed the user/muskan/vector-datatype-support branch from f413160 to 10e0f8e Compare June 16, 2025 10:20
David-Engel
David-Engel previously approved these changes Jun 23, 2025
divang
divang previously approved these changes Jun 27, 2025
@muskan124947 muskan124947 merged commit e6096d8 into main Jun 30, 2025
17 of 19 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Closed/Merged PRs in MSSQL JDBC Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed/Merged PRs
Development

Successfully merging this pull request may close these issues.

5 participants