Description
Description
When using Microsoft.Data.SqlClient
with Linux-based Basic Availability Groups (2 synchronous replicas) and specifying a custom port (e.g., 1066
) in the connection string, automatic failover to the secondary server fails.
After a failover, the client attempts to connect to port 1433, ignoring the explicitly configured FailoverPartner
port and actual server port.
✅ This issue does not occur with Windows-based SQL Server using named instances, even with custom ports, since the client uses the named instance to connect ot the failover (server\instance
like my-failover-partner\sql64std
).
Observed Behavior After Failover for this connection string with Linux host
DataSource = "tcp:my-master-server.mycompany.com,1066",
FailoverPartner = "tcp:my-failover-partner.mycompany.com,1066",
- ❌ If a failover is performed after successfully connecting to the primary instance, the client attempts to connect to
my-failover-partner
without FQDN or port, defaults to port 1433, and fails. - ✅ If the application is restarted, it correctly connects to the failover partner using the specified port.
- ✅ If failback occurs, connections resume correctly.
Potential Cause
From source inspection, the FailoverPartner
is being overridden by the environment change ENV_LOGSHIPNODE
, however it seems that it doesn't contain the port of the sql instance:
🔗 https://github.com/search?q=repo%3Adotnet%2FSqlClient+ENV_LOGSHIPNODE&type=code
Exception Message
A network-related or instance-specific error occurred while establishing a connection to SQL Server.
The server was not found or was not accessible.
(provider: TCP Provider, error: 40 - Could not open a connection to SQL Server)
To Reproduce
- Deploy a Basic Availability Group on SQL Server on Linux (e.g., Ubuntu 20.04) with a custom port (e.g., 1066).
- Use this client code:
private static async Task SimpleFailoverTest()
{
string connectionString = new SqlConnectionStringBuilder()
{
InitialCatalog = "DowntimeTestDB",
DataSource = "tcp:my-master-server.mycompany.com,1066",
FailoverPartner = "tcp:my-failover-partner.mycompany.com,1066",
IntegratedSecurity = true,
TrustServerCertificate = true,
}.ConnectionString;
while (true)
{
try
{
using var conn = new SqlConnection(connectionString);
await conn.OpenAsync();
var cmd = new SqlCommand("SELECT * from [DowntimeTestDB].[dbo].[OperationsLog]", conn);
await cmd.ExecuteNonQueryAsync();
Console.WriteLine($"Success");
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
}
await Task.Delay(1000);
}
}
- Trigger a manual failover to the secondary server.
- Observe client fails to connect post-failover.
Expected Behavior
Client should:
- Dynamically retrieve the actual custom port from the sql server instance (e.g., port 1066)
- Or, provide a configurable option to prevent overriding the
FailoverPartner
in the connection string. - Or, provide a way to set the failover port explicitly somewhere without being overridden.
Additional Context
- ✅ Wireshark confirmed: Client attempts to connect to port
1433
post-failover, ignoring custom port1066
. ⚠️ Failover only works when using port 1433, which is not feasible for us.- I used the following snippet to setup the availability groups, I think that the instance names in this query are used by the client which is why the connection fails (
my-failover-partner
was used for the connection) and the availability group fails to setup with including the custom port likemy-failover-partner,1066
andmy-master-server,1066
)
CREATE AVAILABILITY GROUP [ag_DowntimeTestDB]
WITH (
BASIC,
CLUSTER_TYPE = NONE,
REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0
)
FOR DATABASE [DowntimeTestDB]
REPLICA ON
N'my-master-server' WITH (
ENDPOINT_URL = N'tcp://my-master-server.mycompany.com:5022',
FAILOVER_MODE = MANUAL,
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
),
N'my-failover-partner' WITH (
ENDPOINT_URL = N'tcp://my-failover-partner.mycompany.com:5022',
FAILOVER_MODE = MANUAL,
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
SEEDING_MODE = AUTOMATIC,
SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
)
GO
Logs / Traces from SqlClient
- ✅ Working Failover Trace: Windows SQL Server with named instance failover succeeded on non-default port
1066
(it can be seen that the named instance was used for the connection)
trace_wsl_targets_windows_failover_success.txt - ❌ Failing Failover Trace: Linux SQL Server failover, client doesn’t connect to custom port
1066
(we can see that the instance name with no port was used)
trace_wsl_targets_linux_failover_failure.txt
Note: Focus onsc.SqlInternalConnectionTds.LoginWithFailover
to see the client updating the failover and ignoring the instance custom port.
Current Workarounds (Undesirable)
-
External port redirection from
1433 → 1066
⛔ Not sure how feasable and complex it is -
Standardize to default port
1433
across all clients and servers
⛔ Not feasible due to massive deployment and client base -
Fork and patch
SqlClient
to use1066
as the default
⛔ Maintenance overhead and unclear reliability
Technical Details
- Microsoft.Data.SqlClient version:
5.2.2
,6.0.2
- .NET Target Framework:
.NET 8.0
- SQL Server Version:
- Linux: SQL Server 2022 CU19 with non-default port
- Windows: SQL Server 2022 CU11 with non-default port
- OS Configuration:
- ❌ Failing: WSL Linux client / Windows client → Linux SQL Server
- ✅ Working: WSL Linux client / Windows client → Windows SQL Server (named instances)
Please advise on a potential fix or workaround that would allow failover to connect to the custom SQL Server port or provide a way to prevent overriding FailoverPartner provided in the connection string.