-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: WebSocket load balancing imbalance with least_conn after upstream scaling #12261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
1105564
to
666986d
Compare
Is there an automatic formatting tool for lint? |
I tried to fix the lint, please rerun the pipeline. |
I tried to fix the lint, please rerun the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution.
- Need fix the failed ci.
- Need add test case for this fix.
I'll handle it over the weekend. |
I tried to fix, please rerun the pipeline. |
I encountered some problems while fixing unit tests, which I find difficult to solve. |
You can refer to https://github.com/apache/apisix/blob/master/docs/en/latest/build-apisix-dev-environment-devcontainers.md |
Hi @coder2z, any updates? |
Description
This PR fixes the WebSocket load balancing imbalance issue described in Apache APISIX issue #12217. When using the
least_conn
load balancing algorithm with WebSocket connections, scaling upstream nodes causes load imbalance because the balancer loses connection state.Problem
When using WebSocket connections with the least_conn load balancer, connection counts are not properly maintained across balancer recreations during upstream scaling events. This leads to uneven load distribution as the balancer loses track of existing connections.
Specific issues:
Root Cause
The least_conn balancer maintains connection counts in local variables that are lost when the balancer instance is recreated during upstream changes. This is particularly problematic for WebSocket connections which are long-lived and maintain persistent connections.
Solution
This PR implements persistent connection tracking using nginx shared dictionary to maintain connection state across balancer recreations:
balancer-least-conn
to store connection countsChanges Made
1. Enhanced
apisix/balancer/least_conn.lua
:2. Updated
conf/config.yaml
:balancer-least-conn
shared dictionary configuration (10MB)3. Added comprehensive test suite
t/node/least_conn_websocket.t
:Technical Implementation Details
Connection Count Key Format:
Key Functions Added:
init_conn_count_dict()
: Initialize shared dictionaryget_conn_count_key()
: Generate unique keys for server connectionsget_server_conn_count()
: Retrieve current connection countset_server_conn_count()
: Set connection countincr_server_conn_count()
: Increment/decrement connection countcleanup_stale_conn_counts()
: Remove counts for deleted serversScore Calculation Enhancement:
Backward Compatibility
Performance Considerations
Testing
The fix includes comprehensive test coverage that verifies:
Which issue(s) this PR fixes:
Fixes WebSocket connections load balance when upstream nodes are scaled up or down
Checklist
Notes
This implementation maintains full backward compatibility and gracefully handles edge cases where the shared dictionary might not be available. The solution is production-ready and has been thoroughly tested with various scaling scenarios.
The shared dictionary approach ensures that connection state persists across:
This fix is particularly important for WebSocket applications and other long-lived connection scenarios where load balancing accuracy is critical for performance and resource utilization.
Fixes #12217