Skip to content

fix: assign new_nodes of service discovery to upstream every time #12358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 27, 2025

Conversation

nic-6443
Copy link
Member

@nic-6443 nic-6443 commented Jun 20, 2025

Description

I introduced a performance optimization in #12258, and after testing, we found that the kubernetes performance optimization would invalid after some time.
The reason is that after watch progress is done in kubernetes service discovery for some time, it will re-list all data. At this point, addresses of all tables that store service discovery are changed, but the content of nodes does not change. This causes up_conf.nodes to never reference the new_nodes table at

apisix/apisix/upstream.lua

Lines 305 to 317 in 9e5163c

local same = upstream_util.compare_upstream_node(up_conf, new_nodes)
if not same then
if not up_conf._nodes_ver then
up_conf._nodes_ver = 0
end
up_conf._nodes_ver = up_conf._nodes_ver + 1
local pass, err = core.schema.check(core.schema.discovery_nodes, new_nodes)
if not pass then
return HTTP_CODE_UPSTREAM_UNAVAILABLE, "invalid nodes format: " .. err
end
up_conf.nodes = new_nodes
because their addresses are different and it goes to the branch where equality is determined by value in every downstream requests.
This branch has significant performance overhead:
core.table.sort(old_t, sort_by_key_host)
core.table.sort(new_t, sort_by_key_host)
for i = 1, #new_t do
local new_node = new_t[i]
local old_node = old_t[i]
for _, name in ipairs({"host", "port", "weight", "priority", "metadata"}) do
if new_node[name] ~= old_node[name] then
return false
end
end
end

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working performance generate flamegraph for the current PR labels Jun 20, 2025
membphis
membphis previously approved these changes Jun 20, 2025
f
Signed-off-by: Nic <qianyong@api7.ai>
nic-chen
nic-chen previously approved these changes Jun 23, 2025
@nic-6443 nic-6443 requested a review from membphis June 23, 2025 01:14
membphis
membphis previously approved these changes Jun 23, 2025
AlinsRan
AlinsRan previously approved these changes Jun 23, 2025
Signed-off-by: Nic <qianyong@api7.ai>
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jun 23, 2025
F
Signed-off-by: Nic <qianyong@api7.ai>
membphis
membphis previously approved these changes Jun 27, 2025
ronething
ronething previously approved these changes Jun 27, 2025
Copy link
Contributor

@ronething ronething left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

AlinsRan
AlinsRan previously approved these changes Jun 27, 2025
Comment on lines 49 to 51
core.log.debug("compare upstream nodes by value, ",
"old: ", tostring(old_t) , " ", core.json.delay_encode(old_t, true),
", new: ", tostring(new_t) , " ", core.json.delay_encode(new_t, true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call delay_encode once for each log record, otherwise the content of the log records will be garbled.

Suggested change
core.log.debug("compare upstream nodes by value, ",
"old: ", tostring(old_t) , " ", core.json.delay_encode(old_t, true),
", new: ", tostring(new_t) , " ", core.json.delay_encode(new_t, true))
core.log.debug("compare upstream nodes by value, ",
"old: ", tostring(old_t) , " ", core.json.delay_encode(old_t, true))
core.log.debug(" new: ", tostring(new_t) , " ", core.json.delay_encode(new_t, true))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

f
Signed-off-by: Nic <qianyong@api7.ai>
@nic-6443 nic-6443 dismissed stale reviews from AlinsRan, ronething, and membphis via 7b21236 June 27, 2025 03:34
@nic-6443 nic-6443 requested review from nic-chen, ronething, membphis and AlinsRan and removed request for AlinsRan June 27, 2025 03:35
@nic-6443 nic-6443 merged commit 5e0970e into apache:master Jun 27, 2025
29 checks passed
@nic-6443 nic-6443 deleted the nic/compare-nodes branch June 27, 2025 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance generate flamegraph for the current PR size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants