Skip to content

Connection fails and ignores FailoverPartner port after Linux SQL Server BAG failover, defaults to port 1433 #3400

Open
@AbbasLB

Description

@AbbasLB

Description

When using Microsoft.Data.SqlClient with Linux-based Basic Availability Groups (2 synchronous replicas) and specifying a custom port (e.g., 1066) in the connection string, automatic failover to the secondary server fails.

After a failover, the client attempts to connect to port 1433, ignoring the explicitly configured FailoverPartner port and actual server port.

✅ This issue does not occur with Windows-based SQL Server using named instances, even with custom ports, since the client uses the named instance to connect ot the failover (server\instance like my-failover-partner\sql64std).

Observed Behavior After Failover for this connection string with Linux host

DataSource = "tcp:my-master-server.mycompany.com,1066",
FailoverPartner = "tcp:my-failover-partner.mycompany.com,1066",
  • ❌ If a failover is performed after successfully connecting to the primary instance, the client attempts to connect to my-failover-partner without FQDN or port, defaults to port 1433, and fails.
  • ✅ If the application is restarted, it correctly connects to the failover partner using the specified port.
  • ✅ If failback occurs, connections resume correctly.

Potential Cause

From source inspection, the FailoverPartner is being overridden by the environment change ENV_LOGSHIPNODE, however it seems that it doesn't contain the port of the sql instance:
🔗 https://github.com/search?q=repo%3Adotnet%2FSqlClient+ENV_LOGSHIPNODE&type=code

Exception Message

A network-related or instance-specific error occurred while establishing a connection to SQL Server.
The server was not found or was not accessible.
(provider: TCP Provider, error: 40 - Could not open a connection to SQL Server)

To Reproduce

  1. Deploy a Basic Availability Group on SQL Server on Linux (e.g., Ubuntu 20.04) with a custom port (e.g., 1066).
  2. Use this client code:
private static async Task SimpleFailoverTest()
{
    string connectionString = new SqlConnectionStringBuilder()
    {
        InitialCatalog = "DowntimeTestDB",
        DataSource = "tcp:my-master-server.mycompany.com,1066",
        FailoverPartner = "tcp:my-failover-partner.mycompany.com,1066",
        IntegratedSecurity = true,
        TrustServerCertificate = true,
    }.ConnectionString;

    while (true)
    {
        try
        {
            using var conn = new SqlConnection(connectionString);
            await conn.OpenAsync();
            var cmd = new SqlCommand("SELECT * from [DowntimeTestDB].[dbo].[OperationsLog]", conn);
            await cmd.ExecuteNonQueryAsync();
            Console.WriteLine($"Success");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
        }
        await Task.Delay(1000);
    }
}
  1. Trigger a manual failover to the secondary server.
  2. Observe client fails to connect post-failover.

Expected Behavior

Client should:

  • Dynamically retrieve the actual custom port from the sql server instance (e.g., port 1066)
  • Or, provide a configurable option to prevent overriding the FailoverPartner in the connection string.
  • Or, provide a way to set the failover port explicitly somewhere without being overridden.

Additional Context

  • Wireshark confirmed: Client attempts to connect to port 1433 post-failover, ignoring custom port 1066.
  • ⚠️ Failover only works when using port 1433, which is not feasible for us.
  • I used the following snippet to setup the availability groups, I think that the instance names in this query are used by the client which is why the connection fails (my-failover-partner was used for the connection) and the availability group fails to setup with including the custom port like my-failover-partner,1066 and my-master-server,1066)
CREATE AVAILABILITY GROUP [ag_DowntimeTestDB]
WITH (
    BASIC,
    CLUSTER_TYPE = NONE,
    REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0
)
FOR DATABASE [DowntimeTestDB]
REPLICA ON 
    N'my-master-server' WITH (
        ENDPOINT_URL = N'tcp://my-master-server.mycompany.com:5022',
        FAILOVER_MODE = MANUAL,
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
    ),
    N'my-failover-partner' WITH (
        ENDPOINT_URL = N'tcp://my-failover-partner.mycompany.com:5022',
        FAILOVER_MODE = MANUAL,
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
    )
GO

Logs / Traces from SqlClient

  • Working Failover Trace: Windows SQL Server with named instance failover succeeded on non-default port 1066 (it can be seen that the named instance was used for the connection)
    trace_wsl_targets_windows_failover_success.txt
  • Failing Failover Trace: Linux SQL Server failover, client doesn’t connect to custom port 1066 (we can see that the instance name with no port was used)
    trace_wsl_targets_linux_failover_failure.txt
    Note: Focus on sc.SqlInternalConnectionTds.LoginWithFailover to see the client updating the failover and ignoring the instance custom port.

Current Workarounds (Undesirable)

  1. External port redirection from 1433 → 1066
    ⛔ Not sure how feasable and complex it is

  2. Standardize to default port 1433 across all clients and servers
    ⛔ Not feasible due to massive deployment and client base

  3. Fork and patch SqlClient to use 1066 as the default
    ⛔ Maintenance overhead and unclear reliability

Technical Details

  • Microsoft.Data.SqlClient version: 5.2.2, 6.0.2
  • .NET Target Framework: .NET 8.0
  • SQL Server Version:
    • Linux: SQL Server 2022 CU19 with non-default port
    • Windows: SQL Server 2022 CU11 with non-default port
  • OS Configuration:
    • ❌ Failing: WSL Linux client / Windows client → Linux SQL Server
    • ✅ Working: WSL Linux client / Windows client → Windows SQL Server (named instances)

Please advise on a potential fix or workaround that would allow failover to connect to the custom SQL Server port or provide a way to prevent overriding FailoverPartner provided in the connection string.

Metadata

Metadata

Assignees

Labels

P2Use to label moderate priority issue - impacts atleast more than 1 customer.Triage Done ✔️Issues that are triaged by dev team and are in investigation.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions