Skip to content

Fixed session recovery for Azure SQL DB in redirect mode connected with AAD auth #2668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 13, 2025

Conversation

divang
Copy link
Contributor

@divang divang commented May 21, 2025

Description:
This incident concerns a connectivity issue involving the JDBC driver (version 10) using the latest main branch, specifically when authenticating via Azure Active Directory Interactive. During reconnection attempts, the driver encounters a warning: the Session Recovery feature extension acknowledgment is not sent by the server.

Root Cause
This issue involves a connectivity problem in the JDBC driver (version 10, latest main branch) when using Azure Active Directory (AAD) authentication. After the SQL session (SPID) is killed, the driver's session recovery mechanism fails, leading to reconnection issues. This problem does not occur with SQL Authentication, where tokens are reusable. However, with AAD, the token cannot be reused after disconnection, complicating the recovery process.

The root cause lies in how the driver handles routing information during reconnect attempts. When the initial connection is made, the server provides routingInfo indicating that the connection should be redirected (a two-hop routing scenario):

  • First hop: Connects to the gateway (e.g., *.database.windows.net).
  • Second hop: The gateway responds with routing info for the actual target (redirected) server.

If routingInfo is present, it implies the driver must follow this redirection path to complete the session recovery. However, the current behavior mistakenly terminates the connection prematurely, without attempting the redirect. This causes the driver to repeatedly retry the same initial connection, which never receives the required SESSIONRECOVERY extension.

Resolution
To resolve this issue, the JDBC driver must recognize when routingInfo is present in the server's response. This indicates a redirection is necessary and that the driver should:

  • Not terminate the current connection attempt.
  • Instead, initiate a second connection to the redirected server as specified by the routingInfo.

Upon connecting to the redirected server, the expected TDS_FEATURE_EXT_SESSIONRECOVERY extension is received, allowing the session recovery process to proceed correctly. Implementing this behavior ensures compatibility with AAD authentication and proper failover handling during reconnect scenarios involving redirected endpoints.

Steps:

  1. Execute a SQL query.
  2. The JDBC driver first attempts to connect to divang.database.windows.net.
  3. The server responds with the TDS_FEATURE_EXT_AZURESQLDNSCACHING extension but not the TDS_FEATURE_EXT_SESSIONRECOVERY during the Logon command.
  4. The JDBC driver then attempts a second connection using the redirected endpoint (e.g., db***.**.database.windows.net).
  5. Note: At this stage, the Session Recovery thread is not active, so the connection proceeds without issue.
  6. The server responds to this redirected connection with the TDS_FEATURE_EXT_SESSIONRECOVERY extension.
  7. The query executes successfully.
  8. The previous connection remains active.
  9. Now, kill the SPID associated with that session.
  10. The JDBC driver enters the Session Recovery flow and again attempts to connect to **.database.windows.net.
  11. At this point, the issue manifests:
  12. The server again responds with only TDS_FEATURE_EXT_AZURESQLDNSCACHING.
  13. The JDBC driver expects TDS_FEATURE_EXT_SESSIONRECOVERY, which is missing.
  14. As a result, the recovery flow fails. The Session Resiliency thread becomes active, but the driver continuously retries the initial connection (up to ConnectRetryCount), never redirecting to the appropriate endpoint.
  15. Resolution: When the driver receives TDS_FEATURE_EXT_AZURESQLDNSCACHING, it should extract the routing (redirect) information. If this info is present, it implies a redirect flow is required, and the driver should follow it instead of failing.
  16. Further validation is necessary to confirm this behavior across all authentication methods.
  17. With this fix, the driver retries the second connection to the redirect endpoint (db**.**.worker.database.windows.net).
  18. The server responds with the expected TDS_FEATURE_EXT_SESSIONRECOVERY extension.
  19. The driver successfully recovers and re-executes the SQL query.

Testing
The following code simulates the scenario described above:

    static void testActiveDirectoryInteractive() throws InterruptedException{
       
        SQLServerDataSource ds = new SQLServerDataSource();
        ds.setServerName("XXX.database.windows.net"); // Replace with your server name
        ds.setDatabaseName("divXXX"); // Replace with your database
        ds.setAuthentication("ActiveDirectoryInteractive");
        ds.setUser("dXXX@microsoft.com"); // Replace with your user name
        ds.setLoginTimeout(60 *10);
        ds.setConnectRetryCount(3);
        ds.setConnectRetryInterval(10);
        try (Connection connection = ds.getConnection();
                Statement stmt = connection.createStatement()){
                System.out.println("First interation ...");
                ResultSet rs = stmt.executeQuery("SELECT * from t4");
                while (rs.next()) {
                    System.out.println("row---->"+rs.getString(1));
                }
             
                ResultSet resultSet = stmt.executeQuery("SELECT @@SPID AS SPID");
                int spid =0;
                if (resultSet.next()) {
                    spid = resultSet.getInt("SPID");
                    System.out.println("SPID: " + spid);
                }
                System.out.println("Waiting for 60 seconds...");
                for (int i = 15; i > 0; i--) {
                    System.out.println("Time left: " + i + " seconds spid-" + spid);
                    Thread.sleep(1000); // Wait for 1 second
                }
               // ***** KILL <SPID> ****
                resultSet = stmt.executeQuery("SELECT @@SPID AS SPID");
                spid =0;
                if (resultSet.next()) {
                    spid = resultSet.getInt("SPID");
                    System.out.println("2 SPID: " + spid);
                }
                System.out.println("Second interation ...");
                rs = stmt.executeQuery("SELECT * from t4");
                while (rs.next()) {
                    System.out.println(rs.getString(1));
                }
                //rs.close();
                
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

…nection to resolve Fed auth connection resiliency issue
Copy link

codecov bot commented May 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 51.81%. Comparing base (52303d8) to head (7534af6).
Report is 3 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2668      +/-   ##
============================================
+ Coverage     51.62%   51.81%   +0.18%     
- Complexity     4006     4022      +16     
============================================
  Files           147      147              
  Lines         33804    33802       -2     
  Branches       5652     5649       -3     
============================================
+ Hits          17453    17516      +63     
+ Misses        13888    13811      -77     
- Partials       2463     2475      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Divang Sharma added 3 commits May 22, 2025 18:28
…connection. In next reconnect attempt, driver try with correct end point.
…hoice to take decision on connection termination. If endpoint is routed then no need to terminate the current connection.
@divang
Copy link
Contributor Author

divang commented May 23, 2025

ADO pipeline run

….java

Co-authored-by: David Engel <davidengel@microsoft.com>
@divang divang added this to the 12.11.0 milestone May 27, 2025
@divang divang changed the title In case of federated authentication request, do not terminate the con… In Reconnect flow, if RoutingInfo present in sendlogno() execution, connection termination not required Jun 13, 2025
@machavan machavan self-requested a review June 13, 2025 08:15
@divang divang changed the title In Reconnect flow, if RoutingInfo present in sendlogno() execution, connection termination not required Fixed session recovery for Azure SQL DB in redirect mode connected with AAD auth Jun 13, 2025
@Ananya2 Ananya2 self-requested a review June 13, 2025 08:40
@divang divang merged commit 72eb3f1 into main Jun 13, 2025
19 checks passed
divang added a commit that referenced this pull request Jun 13, 2025
…th AAD auth (#2668)

* In case of federated authentication request, do not terminate the connection to resolve Fed auth connection resiliency issue

* In place of serverSupportsDNSCaching, routingInfo value is a better choice to take decision on connection termination. If endpoint is routed then no need to terminate the current connection.

* Added mock test for connectionReconveryCheck() method
---------

Co-authored-by: Divang Sharma <divangsharma@microsoft.com>
Co-authored-by: David Engel <davidengel@microsoft.com>
divang added a commit that referenced this pull request Jun 13, 2025
…th AAD auth (#2668)

* In case of federated authentication request, do not terminate the connection to resolve Fed auth connection resiliency issue

* In place of serverSupportsDNSCaching, routingInfo value is a better choice to take decision on connection termination. If endpoint is routed then no need to terminate the current connection.

* Added mock test for connectionReconveryCheck() method
---------

Co-authored-by: Divang Sharma <divangsharma@microsoft.com>
Co-authored-by: David Engel <davidengel@microsoft.com>
divang added a commit that referenced this pull request Jun 13, 2025
…th AAD auth (#2668) (#2684)

* In case of federated authentication request, do not terminate the connection to resolve Fed auth connection resiliency issue

* In place of serverSupportsDNSCaching, routingInfo value is a better choice to take decision on connection termination. If endpoint is routed then no need to terminate the current connection.

* Added mock test for connectionReconveryCheck() method
---------

Co-authored-by: Divang Sharma <divangsharma@microsoft.com>
Co-authored-by: David Engel <davidengel@microsoft.com>
@divang divang deleted the user/divang/connection_resiliency_fed_auth_issue branch June 18, 2025 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants